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ABSTRACT 


Literature  pertaining  tc  Voice  Recognition  abounds  with 
information  relevant  to  the  assessment  of  transitory  speech 
recognition  devices.  In  the  past,  engineering  requirements 
have  dictated  the  patn  this  technology  followed.  But,  other 
factors  ao  exist  teat  influence  recognition  accuracy.  This 
thesis  explores  the  impact  of  Human  Factors  on  tne 
successful  recognition  of  speecn,  principally  addressing  the 
differences  or  variability  among  users.  A  Threshold 
Technology  T-cC0  was  used  for  a  100  utterance  vocabulary  to 
test  44  subjects.  A  statistical  analysis  was  conducted  en  b 
generic  categories  of  Human  factors:  Occupational , 
Operational,  Psychological,  Physiological  and  Personal.  How 
the  equipment  is  trained  and  the  experience  level  of  the 
speaker  were  found  tc  be  Key  characteristics  influencing 
recognition  accuracy.  To  a  lesser  extent  computer 
experience,  tirre  or  week,  accent,  vital  capacity  and  rate  of 
air  flew,  speaker  cocperativeness  and  anxiety  were  found  to 
affect  overall  error  rates. 
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I.   INTRCIUCTICK 

The  insistence  and  dependence  upon  state  cf  the  art 
equipment  has  been  a  predominant  characteristic  throughout 
the  efforts  within  the  Command  and  Ccntrci  community, 
respite  the  penchant  for  never,  better,  2nd  more 
sophisticatea  equipment,  there  must  exist  some  measure  cf 
emphasis  cr  the  personnel  needed  tc  train  with,  operate  en, 
and  maintain  the  readiness  of,  such  equipment.  Personnel 
considerations  cannot  be  divorced  from  test  programs 
designed  to  identify  optirral  systems  or  equipment.  When 
these  considerations  are  carefully  examined,  then  the  data 
obtained  from  such  programs  can  be  effectively  used  to 
enhance  personnel  subsystem  design  and  implementation. 

A  personnel  subsystem  test  program  is  one  which  places 
the  requisite  emphasis  on  personnel  rather  than  equipment. 
Kryter  [Ref.  1]  enumerates  six  objectives  necessary  for  a 
successful  test  program. 

1.  To  evaluate  whether  the  system  can  be  operated, 
maintained  and  controlled  by  the  personnel  assigned  to 
it . 

2.  To  determine  the  effect  of  hurrer  performance  on  system 
performance  and  vice  versa.  This  objective  is  ai^ed 
at  discovering  critical   inadequacies   in   man-machine 
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interaction   and   subsequently   identify   changes  that 
would  improve  their  compatibility. 

2.  To  develop  valid  qualitative  erd  quantitative 
personnel  requirements,  selection  procedures,  and 
tables  of  organizational  manning.  Hew  rrany  and  what 
type  of  people  will  provide  optical  effectiveness  c? 
the  man-nachine  interface? 

4.  To  evaluate  individual  and/or   long   term   operational 

readiness  and  applicable  training  programs. 

5.  To  evaluate  training  equipment  and  supporting 
materials . 

6.  To  evaluate  job  aids,  technical  publications  ana  other 
tools  for  training  and  for  assisting  en  the  joh 
performance . 

Increased  productivity  through  automation  involves  two 
major  issues;  technological  and  buman.  Speech  is  a  uniquely 
human  capability.  Speech  recognition  by  a  computer  Involves 
getting  a  machine  to  accept,  recognize,  and  correctly 
respond  to  spoker.  rressages.  This  machine  must  take  the 
input  speech,  compare  it  against  the  expected  pronunciation 
for  allowable  utterances,  identify  the  intended  rressage  or 
utterance,  and  produce  the  correct  and  appropriate  response. 
To  adequately  implement  the  capabilities  of  such  a 
technology,    the   objectives   above   become   all   the  mrre 

It 


relevant.  Cf  paramount  importance  5.5  tee  hurran,  for  it 
takes  people  to  make  all  this  automation  work. 

Speech  recognizers  commercially  available  tcdey  are 
effective  only  within  narrow  limits-  They  hc.ve  relatively 
small  vocabularies  and  'frequently'  confuse  words.  Within 
this  context,  it  becomes  incumbent  upon  the  user  to  develop 
the  skill  tc  talk  to  the  recognizer  [Pef.  2:  p.  26].  As 
such,  a  recognizer's  performance  will  vary  widely  from 
speaker  tc  speaker. 

Much  of  the  work  in  speech  recognition  has  centered  cr 
the  development  and  improvement  cf  speech  recognition 
devices,   for  example: 

—  Linear  Predictive  Coding  (IPC)  in  early  '70s 

—  lynamic  programming 

—  Development  of  1  million  bit/sec  processors 

A  user's  experience  notwithstanding,  the  human  variable  in 
recognition  performance  remains  strong.  This  has  often  been 
observed  in  the  past  and  even  led  tc  a  description  cf  user 
categories  [Hef.  2:  p.  20]  of  'sheeps'  and  'goats'.  These 
speech  recognition  systems  work  well  for  the  'sheep'  but  the 
majority  of  the  problems  ere  created  by  a  small  segment  of 
the  population  -  the  'goats'. 

Recognizing  the  significant  impact  that  engineers  have 
had  on  perpetuating  the  continued  advent  and  technological 
advancement   cf   speech   recognition,   it   is   nevertheless, 
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critical  to  remind  ourselves  of  the  interdisciplinary -nature 
of  speech  recognition.  Besides  engineering,  the  total 
discipline  cf  speech  sciences  and  technology  includes  such 
traditional  disciplines  as  psychology,  linguistics,  anatomy 
and  physiolcgy,  computer  sciences  end  human  factors.  This 
thesis  endeavors  to  examine  the  impact  of  human  factors  on 
the  successful  recognition  of  speech,  principally  addressing 
the  differences  or  variability  among  users. 

First,  the  modality  cf  voice  input  will  te  examined 
citing  some  of  the  more  :readily  apparent  advantages  and 
disadvantages,  and  an  overview  provided  as  to  its  potential 
applicability  in  a  Command  and  Control  environment.  With  a 
general  appreciation  cf  speech  recognition  (tfee  term  'voice 
recognition'  is  syronomous  and  used  interchangeably  within 
this  document)  in  hand,  the  variety  of  human  factors  that 
can  affect  the  successful  recognition  of  speech  by  a  machine 
will  then  be  summarized.  Subsequently,  the  experimental 
methodology  used  to  examine  and  differentiate  speech 
recognition  equipment  users  will  be  presented.  lastly,  the 
experimental  results  will  be  presented  and  an  analysis 
provided  of  the  correlation  of  each  variable  examined  to  its 
associated  error  rates  as  well  as  ar  analysis  cf  variance. 
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II  .   COMPUTES  RECOGNITION  CI    SPEECH 

A.   OVERVIEW  CE  VOICE  INPUT  TECHNOLOGY 

Speech  recognition  can  be  considered  as  a  suoset  of  a 
broader  field  known  as  Speech  Understanding.  Speech 
Understanding  Systems  (SUS)  have  the  objective  of 
interpreting  the  intent  of  the  speaker  whether  or  not  the 
user's  speech  is  grammatically  correct  or  veil  formed. 
While  Speech  Recognition  Systems  (SRS^  are  primarily 
interested  in  the  correct  recognition  cf  every  word,  SUS  are 
concerned  *ith  the  meaning  of  entire  conversational 
segments . 

Until  now  the  only  significant  undertaking  has  been  the 
AR?A  SUB  project  [Ref.  3],  a  five  year  effort  with  the 
objective  of  obtaining  a  breakthrough  in  speech 
understanding  capability  that  would  then  allow  the 
development  of  practical  man-rrachine  communication  systems. 
Specifically,  the  objectives  were  to  develop  a  SUS  that 
would  acce^i  continuous  speech  from  many  cooperative 
speakers  of  a  general  American  public?  a  system  which  used 
syntactic  analysis,  semantics,  pragmatic  information  and 
prosodies  to  acquire  an  appropriate  computer  response. 

The  goals  cf  speech  recognition,  in  contrast,  are  less 
ambitious.  Instead  cf  abstract  concepts  such  as  meaning  or 
understanding,  SRS  try  to  solve  the  mere  practical  problems 
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of  analyzing  the  acoustic  waveforr  ana  applying  pattern 
recognition  techniques  in  order  tc  differentiate  between 
utterances  [Ref.  4].  Figure  1  illustrates  a  typical  soeech 
reccgni  ticn  model . 

The  acoustic  speech  signal  is  first  analyzed  tc  extract 
such  acoustic  parameters  as  frequency  spectrum  and  the 
energy  in  different  time  segments.  Next,  information 
carrying  features  are  extracted  that  define  various  phonetic 
events  such  as  how  ncisy  (fricative-like)  the  signal  is, 
positions  of  different  vowel-like  sounds  and  vibraticr  of 
the  speaker's  vocal  cords.  This  inf  crmati  or.  is  then  used  tr 
divide  the  speech  into  tire  slices  or  segments  and  are 
labelled  with  phonetic  categories.  The  phcuetic  sequence 
for  the  input  speech  is  matched  to  sirred  sequences  of 
expected  pronunciations  for  the  words  in  the  lexicon  or 
dictionary,  and  the  best  patching  sequences  are  determined 
tc  be  the  most  likely  wcrd(s)  that  had  occurred  In  speech. 

Speech  recognition  systems  can  be  considered  a? 
belonging  tc  one  of  two  categories?  continuous  (connected) 
or  isolated  (discrete)  speech  systems.  Continuous  systems 
are  these  which  can  extract  information  from  strings  of 
words  even  though  the  words  run  together  as  in  natural 
speech.  Isolated  systems  require  a  short  pause  before  and 
after  utterances  that  are  tc  be  recognized  as  entities.  The 
minimum  duration  of  a  pause  is  typically  between  10£-2£C 
msec.   An  isolated  word  recognizer  is   also   limited   in  the 
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Information-carrying  features 


PHONETIC  FEATURE 
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Acoustic  Parameters 


ACCUSTIC 
ANALYSIS 


Figure  1.   Speech  Recognition  Model 
(Prom  Reference  4) 
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duration  of  the  spoken  utterance,  usually  2-4  seconds. 
Continuous  speech  recognizers  are  just  nov  beginning  to 
appear  en  the  market  but  are  expensive  and  their 
capabilities  and  reliability  have  yet  to  be  realistically  or 
practically  evaluated.  For  the  remainder  of  this  thesis  our 
discussion  will  be  confined  to  discrete  recognition  systems. 
Two  other  concepts  of  speech  recognition  to  be  discussed 
are  that  of  speaker  independence  and  vocabulary  size. 
Speaker  dependent  systems  are  those  which  require  speaker 
adaptation  (or  'training')  in  order  to  achieve  recognition. 
This  is  in  contrast  to  speaker  independent  systems  which 
will  recognize  speech  regardless  of  the  speaker.  In  terms 
of  speech  recognition  equipment  end  their  associated 
vocabularies,    most    recognizers   work  well   with   small 


voceoularies   of   10-50   words   [lef 


£0]  . 


The 


possibility  of  confusion  cet^een  words  increase*;  as  the 
vocabulary  size  increases,  and  to  some  extent  the  chcir.ee  of 
similar  sounding  words  increases  with  such  larger 
vocabularies . 

At  this  juncture  it  is  appropriate  to  expand  our 
definition  of  'words'  to  encompass  rrcre  than  just  individual 
words.  As  used  herein,  'word'  is  used  interchangeably  with 
the  term  'utterance'  and  may  be  either  a  singular  mono-  or 
polysyllabic  word  or  a  combination  of  rrono-  or  polysyllable 
words  joined  into  a  phrase.   (ie.   Flace-a-C  ircle-on-r"oscow) 
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The  four  processing  functions  [Bef  .  6]  contained  in  a 
limited  vocabulary  voice  recognition  systeir,  as  shown  in 
Figure  2,  consist  of  a  transducer,  preprocessor,  feature 
extractor,  ana  a  final  aecision-level  classifier. 

1.  Transducer:  The  microphone  is  the  interface  between 
the  user  and  the  system  and  converts  the  spofcen  phrase 
into  electrical  signals  that  are  analyzed  ty  the  other 
components  of  the  system. 

2.  Preprocessor:  No  matter  how  it  is  represented, 
spectral  information  must  be  explicitly  or  implicitly 
contained  in  all  speech  ercodings.  The  initial 
analyses  produce  parametric  representations  [F.ef.  7] 
and  taice  place  in  the  preprocessor.  This  segment  of 
the  syster  transforms  the  speech  signal  in  order  to 
enhance  certain  properties  ana  rrafce  them  more  easily 
detectable  in  a  speech  recognition  systeir.  The  signal 
is  normalized  in  time  jy  dynamic  programming  for 
subsequent  comparisons  with  Yarious  reference 
patterns.  Lata  Compression  removes  any  extraneous  cr 
irrelevant  information.  Both  tirre  and  frequency 
domain  analytical  techniques  are  performed  on  the 
input  signal.  Speech  analysis  is  achieved  by  either 
direct  analog  spectrum  analysis  via  fast  fourier 
transform  (JET)  in  the  frequency  domain,  or  linear 
predictive  coding  (IPC)  in  the  tirre  dorrain. 
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CLASSIFIER 
(Decision  Logic) 


figure  2.   Processing  functions  of  a  Speech 
Recognition  Systeir  (Fror  Reference  €) 


if.  feature  Extraction:  The  Key  processing  function  in  a 
pattern  recognition  system  is  the  feature  extractor. 
The  more  optimal  the  set  of  acoustical  features 
extracted  and  sent  to  the  classifier,  tne  less  complex 
the  classifier  need  be  to  achieve  a  given  accuracy 
level.  This  segment  of  the  system  produces  a  set 
numter  of  significant  acoustical  features  (depending 
on  the  individual  recognizer)  a  few  of  which  include 
spectral  slopes,  phonetic  classification,  and  initial 
estimate  of  word  boundary. 

4.  Classifier:  The  classification  process  is  performed 
in  software  using  a  minicomputer.  When  a  speaker 
issues  an  utterance,  the  encoced  features  and  their 
time  of  occurrence  are  stored  in  short  term  memory. 
The  duration  of  the  utterance  is  broken  into  time 
segments  and  the  features  reconstructed  into  tne 
normalized  time  base.  Reference  patterns,  previously 
input  by  the  speaker  for  the  system's  vocabulary  of 
words  are  compared  to  the  feature  occurrence  patterns 
and  a  'best-fit'  or  'closest-match'  determined  for  a 
word  decision.  The  number  of  bits  of  information  for 
the  feature  map  of  each  reference  pattern  is 
determined  by  mapping  the  number  of  acoustic  features 
onto  the  number  of  time  segments. 
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The  first  two  processing  functions  are  accomplished  by  a 
hard  wired  preprocessor  and  feature  extractor.  This 
achieves  real-time  processing  since  only  the  classification 
function  is  performed  in  a  general-purpose  minicomputer 
[Ref .  6:  p.  177]  . 

A  discrete  word  recognizer  must  he  'trained'  for 
individual  talkers  and/or  words.  This  car  be  dene  by  a  user 
simply  speaking  a  set  number  of  training  samples  into  the 
device  to  provide  a  reference  set  of  features.  The  system 
stores  in  memory  the  reference  set  cf  word  features  for  each 
word  (utterance)  the  user  has  spoken.  Cnce  the  system  is 
trained,  the  user  may  speak  words  into  the  device  during 
normal  operation  and  these  are  compared  with  the  stored 
patterns.  The  'closest  fit'  is  selected  as  the  recognized 
word.  This  sequence  of  events  is  cemmenly  partitioned  imc 
the  training  and  recognition  modes  of  operation. 

There  are  two  types  of  errors  that  can  cccur  in  speech 
recognition.  The  first  is  a  rejection,  or  the  inability  of 
the  recognizer  to  correctly  classify  an  utterance.  The 
second,  and  in  a  practical  sense  more  troublesome,  is  a 
misrecognition .  This  occurs  when  the  recogcizer  classifies 
an  utterance  as  something  other  than  what  was  spoken. 
Better  recognizers  usually  have  recognition  algorithms 
designed  to  reject  rather  than  guess  at  questionable  words. 
Higher  quality  systems  such  as  Threshold  (Models  eze  and 
680)  have  error  rates  that  are  quite  acceptable  [Ref.   3,  9, 
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10 J  -  Extensive  experimentation  has  shown  approximate  error 
rates  to  be  between  .2  and  11.4  percent  [Ref.  6:  pp.  l?y- 
180] .  Of  course,  what  constitutes  an  acceptable  error  rate 
is  critically  dependent  upon  the  particular  application  and 
data  entry  rate. 

B.   THE  VALUE  CF  SPEECH  RECOGNITION 

The  lepartirent  of  Defense  nes  been  very  active  in  th  = 
past  few  years  in  their  efforts  to  assess  the  merits  of 
voice  recognition  with  machines.  Such  locations  as  the 
Naval  Postgraduate  School,  Wright  Patterson  Air  Force  "Base, 
Rome  Air  Development  Center,  Naval  Air  Development  Center 
ana  assorted  ether  agencies  ana  contractors ,  have  conducted 
extensive  tests  in  order  to  examine  human  interaction  with 
machines  through  the  use  of  voice  input  and  other 
modalities.  In  order  to  comprehend  the  need  for  further 
research  pertaining  to  voice  input  technology,  it  is 
essential  to  review  the  advantages  and  limitations  thai  this 
type  of  technology  offers.  Mere  importantly,  it  is 
essential  to  understand  its  potential  capabilities  and 
applications  in  a  military  environment.  Is  speech 
recognition  beneficial  (considering  costs  of  5300 
$80,000+),  practical,  and  usable  to  justify  the  continued 
expenditures  of  research  and  development  funds  (£.1  and  6.4) 
and  operational  monies. 
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1.   Advantages  of  Speech  Recognition 


Proponents  of  computer  recognition  of  speech  will 
continually  extol  the  virtues  and  unlimited  possibilities 
the  technology  offers.  In  an  abbreviated  fashion,  the  five 
general  advantages  of  voice  input  to  rrachines  may  be 
summarized  as  follows: 

—  Natural  communication 

—  Training 

—  Multimodal  communication 

—  last  communication 

—  Error  reduction  in  data  input 

Speech  is  cur  most  natural  mode  of  communication. 
It  is  a  familiar,  spontaneous  ana  convenient  method  of 
expressing  one's  thoughts,  ideas,  cr  intentions.  Untrained 
users  of  voice  recognition  s/sterrs,  regardless  of  whether 
they  can  read,  write,  type  or  keypunch,  car  all  speak  cr 
mane  sounds.  These  characteristics  of  the  speech  input 
modality  make  it  applicade  for  users  at  all  general  skill 
levels,  from  systems  engineers  to  computer  operators  to  blue 
collar  workers  on  an  assembly  line. 

A  user  cf  speech  recognition  equipment  requires 
little  or  no  training.  They  have  only  to  restrict  their 
spoken  utterances  to  those  which  the  machine  can  recognize. 
In  the  case  of  discrete  systems,  isolated  words  ar? 
separated  ty  a  short  pause  so  as  to   ease   the   location   of 
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word  boundaries  ana  word  choices  to  which  the  machine  has 
teen  trained  to  recognize.  Although  this  appears  to  te 
disadvantageous,  it  is  more  realistically  a  compromise  to 
natural  speech  in  teat  no  adverse  affects  are  caused  the 
user  in  terms  cf  operating  the  speech  recognition  equipment. 

Experimentation  I.Ref.  11:  p.  608]  has  shown  that 
speech,  instead  of  interrupting  communications  necessary  tc 
perform  other  task;,  can  enable  users  to  do  these  tasks 
simultaneously  with  voice  and  therety  reduce  or  at  a 
minimum,  not  add  to  the  time  required  to  perform  a  complex 
task.  The  advantage  of  having  one's  hands  and  eyes  free  to 
do  other  tasks  is  perhaps  the  pivotal  point  lr  the 
determination  of  applicability  of  speech  recognition 
devices.  This  rrultircoda..  aspect  allows  us  to  place  the 
microphone  anywhere  (headset  mounted,  hand-held,  on  a  stand; 
and  still  coirmunicate  commands  and  irf  ortrat  ion .  Threshold 
Technology  even  has  a  wireless  microphone  [Ref.  12]  that 
permits  extensive  mobility  while  talking  to  computers. 

The  fastest  modality  for  commuricat iens  by  a  human 
is  speech.  An  individual  can  speak  twice  as  fast  as  the 
average  typist  can  type  [Ret*.  5:  p.  45],  This  has  been 
clearly  demonstrated  by  German  and  Chapanis  fRef.  11]  whose 
experimental  results  showed  that  communication  via 
typewriter  or  handwriting  could  not  approach  speech  in  terms 
of  speed  or  task  efficiency.  Further  substantiation  from 
the  Naval   Postgraduate   School   [Ref.  8:  p.  2]  showed  that 
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voice  entry  was  17%  faster  than  typing,  after  only  three 
hours  of  training.  Additionaily ,  while  speech  recognition 
accuracy  is  slightly  degraded  ty  mental  or  iTOtor  loading  of 
the  user  [Bef.  13:  p.  32],  voice  is  nevertheless  faster  and 
more  accurate  than  other  input  modes  when  the  user  must 
perform  another  task  while  simultaneously  interacting  with 
the  speech  recognition  equipment  [Pef.  8:  p.  2] 

Py  now  it  is  clear  that  speech  recognition  permits 
data  entry  directly  into  the  computer  without  intermediate 
steps  suet  as  nranual  transcription  or  Keypunching  which  are 
subject  to  error.  Again,  research  at  the  Naval  Postgraduate 
School  lias  shown  that  183%  more  errors  occurred  in  manual 
data  manipulation  (typing)  than  fcy  voice  [Ref.  8  p.  2]. 
Such  cormon  entry  errors  as  the  transposition  of  digits, 
which  are  usually  caused  ty  eye  movement  or  other 
distractions,  are  almost  eliminated  with  the  use  of 
automatic  speech  recognition  [Ref.  14]. 
2.   Limitations  of  Speech  Recognition 

If  a  particular  technology  was  devoid  of  errors  or 
practical  limitations,  we  could  assume  universal  application 
and  implementation.  Although  the  advantages  of  speech 
recognition  are  seemingly  well  established,  there  do  exist 
several  problems  associated  with  the  ability  to  speak  to 
machines.   These  limitations  include: 

--  User  variability 
—  Constrained  speech 
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—  Isolated,  speech 

—  Breath  noise 

—  User  confusion 

—  Environmental  factors 

Speakers  exhibit  a  ¥ide  range  of  personal 
characteristics  that  add  a  significant  measure  of  difficulty 
in  the  ability  of  a  machine  tc  recognize  speech.  A 
speaker's  sex,  geographic  origin,  and  articulation 
experience  are  just  a  few  of  the  elerrer.ts  that  result  in  a 
user's  variability.  Consistency  is  also  a  Key  element  in 
successful  recognition  accuracy.  A  speaker  may  tali  quite 
differently  in  training  the  machine  as  compared  to  when  he 
or  she  may  use  it  in  a  practical  application.  Additionally, 
physical  changes  in  the  speaker  such  as  age,  physical 
condition,  stress  (physical  or  emotional),  or  fatigue,  to 
name  a  few,  can  induce  variability  that  will  ultimately 
affect  successful  recognition  accuracy. 

An  isolated  word  recognition  system  imposes  a 
restricted  (constrained)  vocabulary  both  in  terms  of  size 
and  content,  upon  the  user.  This  becomes  a  limitation  when 
we  consider  that  most  people  are  accustomed  to  speaking  in 
natural,  fluent  prose.  Because  of  the  limited  vocabulary, 
users  must  be  careful  of  the  types  of  words  included  for 
recognition.  The  similarity  of  sound  structures  between 
words  (ie.  Nine  vs.  Time)  adds  a  -reasure  of  ccnfusicn  that 
can   subsequently  affect   overall   performance.   resign  of 
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a  vocabulary  for  a  particular  application  is  an  important 
and  controllable  factor  in  determining  the  acceptability  of 
voice  input  for  a  given  task. 

Because  isolated  word  recognizers  depend 
significantly  upon  the  detection  of  a  rrinirruir  pause  between 
words,  word  boundary  detection  becotres  perhaps  the  single 
most  critical  limitation.  The  usual  method  is  to  measure 
changes  in  energy  levels  [Rpf.  t ]  .  An  isolated  word  is 
detected  at  a  point  where  the  energy  in  the  acoustic  signal 
rises  above  a  certain  threshold.  At  the  end  of  the  word, 
the  energy  drops,  and  the  resultant  silence  indicates  that 
the  utterance  is  over.  But,  energy  fluctuations  are  not 
enough  to  detect  all  word  boundaries,  and  thus  advanced 
detection  techniques  will  hate  to  involve  detection  end 
inclusion  of  stop  consonants  within  words,  while  eliminating 
pauses  due  to  'lip-smacks'  cr  breath  noise. 

In  a  lirrited  vocabulary,  isolated  word  recognition 
system,  breath  noise  can  be  a  serious  problem  [Kef.  6:  p. 
174].  An  individual  wno  is  involved  ir  little  or  no 
physical  movement  while  engaged  with  a  voice  recognition 
system  can  achieve  very  hi^h  recogrition  accuracy.  This 
accuracy  can  scon  deteriorate  once  the  user  begins  to  move 
around.  Inhaling  will  not  cause  any  adverse  affects  when 
using  a  close-talking,  noise-cancelling  microphone,  tut 
exhaling  will  produce  signal  levels  comparable  tc  speech 
levels.    As   physical   activity   increases   so   does   one's 


31 


breathing  pattern  and  as  a  result  increased  exhalation  will 
lead  to  the  above  mentioned  deterioration  in  recognition 
accuracy. 

While  voice  input  provides  multimodal 
communications,  this  particular  advantage  has  an  inherent 
limitation  in  that  the  user  can  Income  confused  as  to  what 
mode  to  use.  As  a  result,  input  modalities  can  become 
confused,  and  interfere  with  each  other  so  that  tre  total 
rate  of  information  transfer  may  net  be  as  high  as  the  sum 
of  the  rates  possible  with  each  separate  modality. 

Finally,  the  environment  in  which  the  speech 
recognition  device  is  placed  may  have  en  inadvertent  affect 
on  recognition  accuracy.  For  example,  speech  recognition  in 
an  aircraft  cockpit  may  be  degraded  due  to  eng' ne  noise  or 
conflicting  voice  emanating  via  aircraft  radio 
communications.  Or,  consider  the  placement  of  such 
technology  in  a  crowded  Military  Command  Center  where  its 
reliability  can  be  affected  ty  background  noise  from  other 
members  located  in  the  nearby  work  space. 

C.   APPLICABILITY  OF  COMPUTER  BECGGNITICN  OF  SPEECH 
1 .   Commercial  Applications 

The  first  voice  input  systems  to  be  used  by  industry 
were  installed  in  late  1972  and  early  1973  [Hef  15].  These 
early  applications  included: 

—  quality  control  and  inspection 
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--  automated  material  handling 

—  direct  voice  input  to  computers 

Their  successful  implementation  was  due  in  lar^e  part  to 
recognition  accuracies  that  were  greater  than  or  equal  to 
the  manual  Keying  accuracies  obtained  from  the  same 
personnel . 

In  nost  quality  control  and  inspection  processes  the 
inspector's  hands  and/cr  eyes  are  occupied  in  the  inspection 
task.  Through  the  use  of  a  voice  recognition  system  it  is 
possible  to  combine  the  inspector's  normal  work  requirements 
with  the  simultaneous  entry  of  all  data  measured  and 
observed.  Cwens-Iiliocis  Corporation  installed  voice  data 
entry  equipment  in  early  1SJ73  for  the  inspection  cf  color 
television  faceplates.  Here  was  an  application  -where  the 
inspector  "had  to  manipulate,  orient,  and  measure  parameters 
using  gauges  and  meters".  The  requirement  to  simultaneously 
record  the  measurement  data  also  existed.  In  this  example 
the  operator  was  atie  to  achieve  hoth  tasks  at  once  [Ref.  6: 
pp.  182-183] . 

Voice  entry  has  been  utilized  in  recent  years  to 
control  the  movement  of  materials  such  as  parcels, 
containers,  baggage  etc.  through  distribution  and  sorting 
centers.  A  voice  controlled  package  routing  system 
installed  by  SS  Kresge  in  November  iy74  allowed  just  one 
operator  to,  handle  each  item,  read  the  label,  and  speak  the 
destination  code  for  each  carton   into   his/her   microphone. 
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Jjormeriy  this  had  been  an  operation  that  required  tvo 
persons  and  still  resulted  in  the  'bunching'  up  of  different 
size  packages.  Following  the  installation  cf  voice 
activated  sorting  equipment,  the  bunching  problem  was 
eliminated,  productivity  increased,  and  sorting  errors 
reduced  [Bef.  6:    p.  165] 

2.   Military  Applications 

These  applications  may  be  placed  in  the  general 
categories  of,  equipment  and  process  control,  field  data 
entry,  data  management,  and  cooperative  man-  machine  tasks. 
A  more  defiritive  classification  was  proposed  by  3eek  et. 
al .  in  1*77  [Ref .  16]  tc  include  the  general  areas  of 
Security,  Ccmmand  and  Control,  Data  Transmission  and 
Communication  and  Processing  Distorted  Speech.  Table  I 
provides  a  recapitulation  of  military  tasks  that  could  be 
considered  for  speech  recognition  technology. 

Of  particular  interest  is  the  use  of  speech 
recognition  for  Command  and  Control  applications.  The  term 
C3,  Command,  Control,  and  Communications,  refers  tc  an 
overall  system  comprised  as  a  minimum  of  these  key  elements. 

a.  Command  Authority:  The  commander  provides  tire  central 
authority,  unity  of  purpose,  and  the  overall  concept 
as  to  how  operations  will  be  conducted  to  accomplish 
mission  objectives. 
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TABLE  I 

MILITARY  APPLICATIONS  IOB  SPEECH  RECOGNITION 
(From  Reference  16) 


I.  SECURITY 

A.  Speaker  Verification  (authentication) 

B.  Speaker  Identification  ( recogni tior ) 

C.  Determination  of  emotional  effects  (ie.  stress) 

D.  Recognition  of  spcfcen  codes 

E.  Secure  access  voice  identification 

E.   Surveillance  of  communication  channels 

II.  COMMAND  AND  CONTROL 

A.  System  control  (ships,  aircraft,  situation 

displays,  etc.) 

E.  Voice  operated  computer  input/output 

C.  Data  handling  and  record  control 

D.  Material  handling  (mail,  baggage,  publications) 

E.  Remote  control  (hazardous  materials) 
E.  Administrative  record  control 

III.  DATA  TRANSMISSION  AND  COMMUNICATION 

A.  Speech  synthesis 

B.  Vocoder  systems 

C.  Eandvidth  reduction 

D.  Ciphering/coding/scrambling 

IV.  PROCESSING  DISTORTED  SPEECH 

A.  Diver  speech 

B.  Astronaut  communication 

C.  Underwater  telephone 

D.  Oxygeo  mask  speech 

E.  High  'G'  force  speech 
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D.  Organization:  This  element  provides  the  pathways 
through  which  the  plans,  priorities,  and  directives  of 
the  commander  are  provided  to  the  force  and  through 
which  information  pertaining  to  the  forces  can  fie 
provided  the  central  authority.  These  pathways  are 
found  at  each  echelon  in  the  fcrrr  of  command  pests, 
operations  centers,  or  command  centers. 

c.  Corrmunicaticns :  This  provides  the  means  for 
transmitting  plans,  priorities,  and  orders  to  elements 
of  the  force  and  the  means  by  which  the  forces  nay 
inform  the  Commander  cf  their  activities  and  needs. 

d.  Information:  A  Key  element  that  facilitates  control 
by  confronting  the  Commander  with  oily  that 
information  required  to  support  the  decision-ma"kiag 
process.  Information  supports  both  the  staff 
planning  and  command  decision-making  process  at  ail 
levels  . 

The  command  centers  that  will  provide  the  requisite 
organizational  framework,  perform  several  vital  functions 
for  the  Commander.  First,  is  the  capability  tc  communicate 
securely,  and  preferably  ty  voice  over  a  wide  choice  of 
circuits.  Secondly,  each  command  center  has  the  task  cf 
integrating  information  which  comes  from  its  supporting 
elements.  A  third  capability  provided  by  these  centers  is 
the  processing  and  display  of  information.  The  fourth 
function,  associated  with  number  three,   is   the   quick   and 


accurate  dissemination  of  inf  orrration ,  reports,  and 
directives  for  the  Commander. 

We  are  particularly  interested  in  the  function  cf 
information  processing  and  dissemination  as  it  protMes  e 
suitable  application  for  computer  recognition  of  speech. 
Command  center  automation,  resulting  in  more  efficient 
communications,  will  lead  to  increased  productivity.  In  its 
broadest  sense,  communication  is  the  management  of 
information,  and  information,  not  paper,  is  the  chief 
product  of  the  command  center.  Cur  C3  systems  that  are 
designed*  ana  fielded  for  these  centers,  and  speech 
recognition  as  -  a  component  of  such,  can  preside  cur 
Commanders  the  capability  to  "observe"-,  "decide",  "act",  and 
"react"  with  speed,  decisiveness  and  accuracy. 

Navy  feasibility  studies  sponsored  by  Naval 
Electronics  Command  and  conducted  by  Dr  G.X.  Pcock  of  the 
Naval  Postgraduate  School,  examined  the  potential  tor  voice 
data  entry  fcr  Command,  Control,  and  Communications.  Two 
voice  recognition  systems  were  installed  in  1980  at  Fleet 
Headquarters,  Commander-in-Chief  Pacific  (CINCPJ.CFLT)  in 
Hawaii  to  examine  the  benefits  and  limitations  of  voice 
input  for  operation  cf  the  Worldwide  Military  Command  and 
Control  Time-Sharing  System  (WWMCCS  TSS)  and  the  Ocean 
Surveillance  Intelligence  System   (CSIS)    [Ref.  17:  p.  34]. 
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Poock  has  also  demonstrated  that  using  voice  input 
to  exercise  a  typical  scenario  on  the  ARPANIT,  an 
experimental  network  since  196S  employing  packet  switching 
technology  and  connecting  over  150  host  con-peters,  was 
significantly  faster  and  mere  accurate  than  entering  the 
commands  manually  [Ref  S]  .  Twenty-four  subjects  fallowed  a 
fixed  scenario  of  instructions  where  they  accessed  the 
ARPANET,  logged  into  different  host  computers,  read 
messages,  sent  messages,  read  files,  transferred  files 
between  host  computers,  deleted  files  and  interconnected 
host  computers.  Simulated  command  centers  operating  on  this 
network  include  the  Naval  Postgraduate  School  (Monterey, 
California),  Naval  Ccean  Systems  Center  (San  Diego, 
California)  and  CINCPACFLT  (Hawaii). 

Automatic  speech  recognition  has  also  teen  found  to 
have  considerable  potential  for  imagery  interpretation  and 
intelligence  report  generation  [Ref.  17:  p.  49].  A 
significant  amount  of  research  has  been  performed  for  the 
Defense  Mapping  Agency  (DMA)  for  such  applications  as  voice 
data  entry  for  the  processing  of  Digital  landmass  System 
(DIMS)  data,  preparation  of  Plight  Information  Publication 
(FLIP)  data  and  ocean-depth  measurements  for  digitized 
cartographic  applications.  In  all  these  applications  the 
environment  is  such  that  the  operator's  hands  are  busy  and 
frequently  involve  the  use  of  stereo  optics  and  other 
special   devices.   Voice  has  been  shewn  experimentally  to  be 
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faster,  easier,  ana  a  less  fatiquing  rrode  of  data  entry  than 
historically  rrcre  conventional  means  [Ref.  17:  p.  37].  fore 
recently,  the  feasibility  and  advantages  of  -voice  input 
technology  vere  described  for  use  in  tbe  COINS  Network 
Control  Center  (CNCC),  The  Community  On  Line  Intelligence 
System  interconnects  on-line  information  storage  and 
retrieval  systems  located  at  a  number  of  locations  within 
the  United  States  intelligence  community  [He?.  18]. 
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Ill .   HUMAN  FACTORS  IN  SPEECH  RECOGNITION 

A.   DEFINITION  AND  PURPOSE 

Human  factors  is  concerned  with  improving  the 
product!  vi  ty  of  the  user  by  taking  into  account  human 
characteristics  in  the  design  of  a  system.  As  described  by 
Huchingson  [Ref.  19:  p.  4], 

The  term  "human  factors"  is  more  comprehensive,  covering 
all  biomedical  and  psychosocial  considerations  applying 
to  man  in  the  system.  It  includes  not  only  human 
engineering,  but  also  life  support,  personnel  selection 
and  training,  training  equipment,  job  performance  aids, 
and  performance  rreasurement  and  evaluation. 

The  people  referred  to  in   this   definition   ere   those  vho 

typically  operate,  maintain  or  service  the  system.   They  are 

those  who  will  interact  with  the  system's  design.   When   the 

focus   is   on   a   broader  interpretation  it's  appropriate  to 

speaic  of  a  Human  Factors  Subsystem  or  Personnel  Subsystem  as 

was  described  earlier. 

Human  factors  engineering   deals   principally   with   the 

many   factors   involved  in  the  design  of  a  new  system  -  from 

hardware  co  personnel.   For  our  efforts   in   this   analysis, 

the   current  technology  has  been  determined  to  be  acceptable 

and,  experimentally  as  well  as   operationally   reliable   for 

its   use   in   a   Command  and  Control  environment.   New,  user 

variability  is  to  be  investigated  further  in  terms  of  how  it 

affects  recognition  accuracy. 
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Since  energy  in  a  speech  signal  is  usually  displayei  in 
terms  of  frequency,  intensity  and  tirre,  it  would  seem 
plausible  that  each  wcrd  should  have  a  unique  acoustic  wave 
pattern  and,  if  so,  word  recognition  would  he  a  simple 
matter  of  the  voice  recognition  system  scanning  the  pattern, 
comparing  the  simple  pattern  with  a  data  hank  of  reference 
word  patterns,  and  deciding  which  word  was  spoken. 
Unfortunately,  hurran  variability  messes  up  this  uniquely 
simplistic  approach.  Our  purpose  then  is  to  discuss  the 
human  as  a  component  in  a  complex  system  designed  by  humacs 
and  to  note  the  fundamental  advantages  and  limitations  of 
the  human  in  relation  tc  an  automated  voice  recognition 
system. 

B.   FACTORS  AFFECTING  RECOGNITION  ACCURACY 
1 .   General 

Limitation  of  vocabularies  to  10£  words  have 
resulted  in  identification  accuracies  of  between  98%  -  99% 
in  a  controlled  laboratory  environment.  In  an  operational 
or  field  setting  recognition  accuracies  have  been  reported 
as  low  as  50%  [Ref.  20:  p.  6Z6] .  Various  factors  noted  for 
interfering  with  successful  identification  have  included 
background  noise,  inconsistent  microphone  placement, 
insufficient  training,  inconsistent  speaking  style,  and  the 
lack  of  user  cooperation.  Lea  in  a  paper  titled  "What 
Causes  Speech  Recognizers  to  Make  Mistakes?"  [Ref.  21]  calls 


41 


for  the  determinat  ion  of  those  factors  that  influence 
recognition  accuracy  rather  than  the  repeated  assessment  of 
transitory  devices.  Table  2  summarizes  the  four  'dimensions 
of  difficulty'  Dr  Lea  has  proposed.  What  needs  to  te 
accomplished  is  the  characterization  cf  the  relati7e  effects 
of  changes  along  each  of  these  four  dimensions,  or  more 
simply  statea,  find  the  factors  influencing  the  accuracy  of 
machines  that  recognize  speech. 

Because  there  are  so  many  variables  involved  that 
affect  recognition  accuracy,  the  list  in  Table  7  may  be 
reorganized  in  a  "communication-theoretic"  framework.  This 
framework  models  the  speech  recognition  error  rate  as  a 
function  cf  seven  complex  sets  of  factors  [Eef.  *:  pp.  69- 
93]  that  include: 

—  Tasur  factors 

—  Human  Factors 

—  Language  Factors 

—  Channel  and  Environmental  Factors 

—  Algorithmic  Factors 

—  Performance  Factors 
--  Bespcnse  Factors 

It  is  the  set  of  Human  Factors  that  this  experiment 
and  analysis  is  principally  concerned  with,  for  it  is  this 
stage   of   the   model   that   has  a  major  impact   on   speaker 


42 


DIMENSIONS  OF 


TABLE  II 
DIFFICULTY  FOR  SPEECH 
(i'rom  Reference  5} 


RECOGNITION 


TASK  AND 

PERFORMANCE 

REQUIREMENTS 


HUMAN 
VARIABILITY 


1.  iorm  of  speech  to  be  recognized 

2.  Accuracy  requi rerrents 

3.  Required  throughput  rates 

4.  Type  of  device  necessary 


1 .  Sex 

2.  Dialect 

3.  Vocal  tract  size 

4.  Vocal  cord  characteristics 

5.  Pronunciation  habits  of  *peaki?r 

6.  Physical  state 

7.  Psychological  state 

8.  Workload 

9.  Coopera ti veness 

10.  litre  or  day/veek 

11.  Time  since  training 

12.  Number  of  training  sairples/vord 

13.  Rate  of  talking 


LANGUAGE 
DIFFICULTIES 


1 
2 
3 

4 


7 
8 
9 

10 


Size  of  active  sufcvocaDui3ry 
Word  length 
fcord  sound  structure 
Confusability 
Language    spoken 
Syntactic,  semantic,  ana 
pragmatic  constraints 
Enhanceabili ty 
Stress  Pattern 
Intonational  variability 
Rhythm  and  tirring  variability 


ACOUSTIC 

DIFFICULTIES 


1.  Noise  level 

2.  Type(s)  of  noise 

3.  Bandwidth 

4.  Spectral  distortions 

5.  Transducer  characteristics 

6.  Placement  of  the  transducer 

7.  Amplitude 

8.  Vibration 

9.  Acceleration 
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variability.  This  set  or  hutraD  factors  can  be  further 
subdivided  [Ref.  21:  p.  2]  in  order  to  rronitor  their 
influence  on  recognition  error  races.  A  few  of  these  are 
listed  below: 

—  Speaker  Experience 

—  Training  Method 

—  Sex  of  the  Speaker 

—  Physical  Dimensions  of  the  Speaker 

—  Geographic  Origin  of  the  Speaker 

—  Speaker  Dialect 

—  Physical  State  of  the  Sjeaser 

—  Psychological  State  of  the  Speaker 

—  Speaker  Cocperativeness 

—  Tirre  of  Day  or  Week 

Because  different  speakers  may  demonstrate  widely 
varying  methods  of  pronouncing  words  or  phrases,  the  above- 
listed  factors  may  be  further  separated  into  two  categories; 
those  occurring  between  speakers  and  these  affecting  each 
individual  speaker.  First,  some  of  the  differences  between 
speakers  that  induce  variability  will  be  briefly  examined 
and  then  the  variabilities  apparent  within  each  speaker  that 
can  affect  recognition  accuracy  will  be  discussed. 
2.   Differences  Between  Speakers 

Speaker  Experience:   This  factor  csn  take  on  a   two- 
fold meaning  when  looking  at  it  as  a  source  of  variability. 
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First  is  the  experience  of  using  voice  recognition 
equipment.  Experienced  voice  recognition  users  should  be 
expected  to  have  a  higher  and  more  reliable  recognition 
accuracy  than  those  who  are  'naive'  to  the  technology. 
These  experienced  users  are  comfortable  using  the  equipment, 
less  lively  to  be  intimidated  by  the  system,  and  are 
familiar  with  its  performance  capabilities  from  previous 
usage.  The  other  meaning  of  speaker  experience  has  tc  10 
with  job  skill.  Can  a  user  who  operates  in  a  microphone 
environment  cr:  a  laily  or  regular  basis,  such  as  an  Air 
Traffic  Controller  or  a  Pilot,  te  expected  to  have  better 
recognition  rates  than  those  who  have  never  spoken  into  a 
microphone?  A  aata  processor  who  works  regularly  in  an 
environment  demanding  precise  data  entry  by  keyboard  might 
have  the  type  cf  experience  or  skill  factor  that  would 
provide  an  edge  over  a  prospective  user  possessing  only 
basic  typing  skills.  This  type  cf  experience  overlaps 
slightly  with  speaker  cooperativeness  and  will  be  elaborated 
upon  later. 

Method  of  Training:  The  ideal  form  of  voice 
interaction  would  be  for  a  user  to  pick  up  the  microphone, 
speak  commands  the  machine  can  understand,  and  for  the 
appropriate  response  to  take  place.  Naturally,  this  is  the 
goal  of  speaker  independent  systems,  but  since  humans  all 
speak  differently  and  our  form  of  speech  recognizer  is 
discrete,  ve   are   mandated   to   provide   the   machine   some 
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information  about  how  we  speak  each  word  intended  for  our 
desired  vocabulary  (ie.  Training).  Tbe  method  by  which  the 
machine  is  trained  by  the  user  will  in  large  part  dictate 
sutsequent  recognition  accuracy.  If  the  user  is  closely 
supervised  and  made  to  carefully  speak  the  particular 
vocabulary  then  we  should  be  able  to  expect  higher 
recognition  rates  as  opposea  to  the  user  who  is  given 
cursory  instructions  on  the  use  of  the  equipment  and  allowed 
to  go  on  independent  of  further  supervision  during  the 
training  rrode.  An  adjunct  of  training  rrethod  is  tbe  number 
of  training  'samples'  cr  pronunciation  pattern.  It  is 
difficult  to  achieve  accurate  speech  recognition  when  the 
nurber  of  training  passes  per  word  is  small  or  smaller  than 
manufacturer  specifications  [Eef.  22].  Using  identical 
equipment,  it  would  still  be  reasonable  to  anticipate  some 
speakers,  hawing  had  a  lesser  amount  of  training  samples  per 
word,  having  more  success  than  others  who  have  had  mere 
samples  per  word . 

Sex:  Pale  voices  have  lower  frequencies  than 
females  and  a  rrore  detailed  spectral  structure  results  from 
the  lower  pitch  of  their  voices.  This  detailed  structure  is 
more  indicative  of  the  vocal  mechanism  and  of  the  intended 
vowels  and  consonants  spoken.  Male  voices  tend  to  fare 
better  with  recognizers  employing  frequency  domain  analysis 
while  female  voices  tend  to  have  greater  success  with 
machines   using   time   domain   analysis   [Eef.  5].   A  recent 
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comparison  was  conducted  [Ref .  22]  which  revealed  no 
statistically  significant  difference  between  the  sexes. 
Although  not  a  primary  objective  of  the  thesis,  it  re^ai~s  a 
source  of  variability  that  merits  sorre  measure  of  analysis. 

Speaker  Dialect:  Dialects  not  only  effect  the 
specific  sound  produced  for  each  vowel  or  consonant  type, 
but  also  exhibit  different  dynamics  of  speech  production. 
For  example,  Southerners  have  their  readily  identifiable 
drawl,  whereas  a  New  Yorker  will  tend  to  say  "Toid"  rather 
than  "Third"  and  residents  of  Cambridge,  Massachusetts  can 
be  heard  to  talk  about  "Hahvahd"  instead  of  "Harvard". 

Physical  Dimensions:  Throughout  the  literature  on 
speech  recognition  one  will  see  speaker  variability 
attributed  to  a  variety  of  factors,  none  of  which  include 
the  physical  dimensions  of  the  sjeeker.  An  examination  of 
the  recognition  accuracy  for  a  selected  sample  population 
based  on  physical  dimensions  would  provide  an  interesting- 
insight  into  the  ramifications  of  such  a  factor  as  a 
component  within  a  personnel  selection  subsystem.  In  other 
words,  wbat  effect,  if  any  will  height  and  weight  have  on 
recognition  accuracy? 

Geographic  Origin:  This  particular  factor  is 
multidimensional  consisting  of  several  sub-factors  which 
require  careful  examination: 

—  Place  of  birth 

—  Geographic  area  of  upbringing 
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—  Ethnic  background 

—  Religious  preference 

The  above  may  irrpose  ideosyncratic  or  social  differences  in 
habits  which  can  produce  variations  in  sound  and 
subsequently  in  pronunciation.  These  suD-4actors  all 
contribute  a  reasure  of  variety  that  can  presumably  affect 
recognition  accuracy. 

3.   Differences  Within  Speakers 

Physical  State:  The  present  physical  state  of  a 
user  of  voice  recognition  equipment  can  precipitate 
variability  in  his  or  her  voice,  For  example,  a  cold,  seme 
form  of  pathological  condition,  fatigue  etc.  can  alter  the 
speaker's  voice.  The  individual's  voice  quality  could  oe 
different  based  on  physical  conditioning.  Is  the  user  who 
works  cut  regularly  and  stays  in  excellent  physical 
condition  more  lively  to  show  higher  recognition  rates  than 
one  who  rarely  exercises,  smokes  regularly  and  generally  is 
not  in  the  best  of  health? 

Psychological  State:  Spielcerger  [Hef.  23:  p.  291 
defines  transitory  or  state  anxiety  as  a  complex,  unique 
emotional  condition  that  can  vary  in  intensity  and  fluctuate 
over  time.  State  anxiety  may  be  thought  of  as  consisting  cf 
unpleasant,  consciously  perceived  feelings  of  tension  end 
apprehension  with  an  accompanying  activation  or  arousal  of 
the  autonomic  nervous  system.  The  concept  of  trait  anxiety 
refers   to   the   relatively  stable  individual  differences  in 
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anxiety  proneness.  It  may  aiso  "be  a  reflection  on  the 
frequency  and  intensity  with  which  state  aniiety  has  been 
previously  manifested  ana  the  probability  that  such  anxiety 
will  occur  in  the  future  [Ref.  23:  p.  39] .  The  fact  that 
physiological  functioning  is  affected  during  periods  of 
anxiety  is  easily  apparent.  The  degree  to  which  speakers 
deal  with  a  state  cr  trait  anxiety  may  well  be  a  significant 
variable  of  consideration  in  the  examination  of  error  rates 
of  voice  recognition  systems. 

Speaker  Cooperat iveness :  Bcv  enthusiastic  and/or 
willing  a  speaker  is  toward  the  use  of  ?oice  recognition 
equipment  could  induce  speaker  variability  and  hence 
subsequent  recognition  accuracy.  In  a  military  environment 
where  many  job  positions  are  of  a  non-voluntary  variety,  it 
is  conceivable  to  expect  the  selection  of  voice  recognition 
users  who  are  told  to  operate  the  equipment  regardless  of 
their  personal  preferences.  If  the  use'r  distrusts  the 
technology  or  prefers  manual  entry,  and,  is  still  required 
to  use  voice,  we  have  developed  a  non-ccoperative  user.  A 
non-cooperative  user  is  therefore,  one  who  is  consciously 
trying  to  undermine  the  successful  operation  of  the  machine. 
The  cooperative  user  is  one  who  is  willing  to  help  the 
rrachine  by  saying  precisely  what  the  machine  wants  and 
pronouncing  it  in  a  clear  and  consistent  manner.  There  i?  a 
certain  grey  area  surrounding  this  factor  with  the  presence 
of  users  who,  although  not  consciously  trying  to  confuse  the 
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device,   are  not  fully  committed  to  "helping  tee  machine"  to 
recognize  the  correct  utterances. 

Tirre  of  Day/Week:  Each  person's  speech  is  variable 
depending  upon  time  of  day,  changing  from  morning  to  evening 
and  even  changing  progressively  over  a  period  of  time  [Ref. 
5].  An  examination  of  recognition  performance  over  extended 
periods  of  time  [Ref.  24:  p.  lj  shoved  a  statistically 
stable  performance  over  time  (21  weeks)  with  no  serious 
degradation  occurring  as  time  elapsed.  Nevertheless  a  user 
who  has  a  gap  in  time  between  training  and  operational  use 
may  forget  any  special  ways  he/she  trained  the  machine.  Row 
much  of  a  gap  is  tolerable  is  a  sucject  for  future  research. 
4 .   Miscei  larecus  Factors 

Some  additional  human  factors  that  have  been 
proposed  [Bef .  oj  deserve  a  brief  description.  They  have 
been  relegated  to  a  separate  section  because,  for  one  reason 
or  another,  lack  of  equipment,  current  technical  skills, 
lack  of  measurable  quantitative  data  etc.  experimental 
examination  at  the  present  time  has  beer,  precluded.  These 
factors  include: 

—  Form  of  speech 

—  Speaker  dependence 
--  Rate  of  speech 

—  Vocal  tract  size 

—  Speaker's  glottal  spectrum 
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Form  of  speech  refers  to  the  type  of  voice 
recognition  systerr  to  be  used,  isolated  or  continuous. 
Continuous  systems,  being  a  quantum  step  above  isolated  in 
terms  of  complexity,  bring  about  a  greater  opportunity  fcr 
speaker  variability  to  manifest  itself.  Such  things  as 
detection  of  word  boundaries,  slurring  of  speech  (ie.  "dija" 
vs  aid  you  ),  and  prosoaic  characteristics  could  seriously 
affect  recognition  accuracy  because  of  these  types  of 
complications  which  a  continuous  speech  recognition  system 
introduces . 

A  speaker  independent  system  negates  the  requirement 
for  training  and  thus  variability  between  speakers  becomes  a 
more  critical  factor  for  independent  systems  to  contend 
witb.  Independent  recognizer  performance  will  fcave  to  he 
tailored  to  acccmmcdate  an  unlimited  number  of  potential 
speakers  ana  their  associated  variability. 

The  faster  a  person  speaks  the  more  likely  that  the 
expected  pronunciation  will  be  altered  due  to  slurring, 
deleted  syllables,  etc..  If  a  machine  is  traired  tc  one 
form  of  pronunciation  ana  at  one  particular  rate  of  speech, 
a  differing  rate  in  an  application  mode,  will  cause  an 
increase  in  recognition  difficulty.  With  an  isolated  word 
recognizer  tc  be  used  in  the  experimentation,  requiring  a 
minimum  of  100  msec  pause  between  utterances,  and  utterances 
not  exceeding  2.0  seconds  in  duration,  this  particular 
factor  was  not  considered  essential  to  the  overall  analysis. 
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It  is  rather,  an  important  factor   in   terirs   of  continuous 
recognition  systems. 

The  size  of  the  vocal  tract  will  produce  changes  in 
the  forrants  of  the  speech  signal;  the  srraller  the  vocal 
tract  the  higher  the  fcrmants.  This  can  have  an  impact  en, 
for  exarrple,  transmission  through  limited  bandwidth 
channels.  Vccai  cord  characteristics  also  predvee 
interspeaKer  variability  such  as  pitch  or  "resonant"  quality 
of  the  voice.  Speakers  with  rrore  "resonant"  voices  that 
project  well,  will  oe  easier  for  recognizers  to  handle  [Bef. 
5:  p.  78]  . 
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IV.   DESCRIPTION  01  TBE  EXPERIMENT 

A.   OBJECTIVES  AND  CONSTRAINTS 
1 .   Objectives 

As  noted  earlier,  our  overall  objective  was  tc 
examine  the  human  as  a  component  in  a  complex  system.  In 
narrower  terms,  this  experimentation  attempts  to  assess  the 
affect  of  differing  occupational,  operational,  personal, 
physiological,  and  psychological  characteristics  of  a  user, 
on  the  accuracy  with  which  a  currently  available  voice 
recognition  system  will  correctly  interpret  spoken 
utterances.  Subsequently,  our  discussion  will  address  the 
occurrence,  if  any,  of  existing  quantitative  parameters  that 
would  enable  us  to  differentiate  between  effective  and  non- 
effective users  of  voice  recognition  systems. 

The  following  specific  characteristics  are  examine': 
in  this  thesis.  Many  of  the  individual  characteristics,  or 
human  factors,  are  self-explanatory  while  others  are 
provided  with  a  brief  explanation  and/or  rationale  for 
selection . 

a.   Occupational  Characteristics 

This  set  of  parameters  examines  the  possible 
effect  on  recognition  accuracy  due  to  differences  inherent 
in  a  user's  occupational  skill  or  jot  (military  or  civilian) 
background.   Specific  characteristics  include: 


Job  function:  Comparison  of  recognition  rates 
Between  microphone  experienced  users  (if?,  pilots, 
air  traffic  controllers)  and  non-experienced  users. 

Branch  of  service:  A  factor  with  possible 
consequences  pertaining  to  its  use  in  personnel 
selection  criteria. 

Job  satisfaction:  A  subjective  evaluation  by  the 
user  as  to  his/her  job  satisfaction  in  their  current 
duty  assignment  end  their  satisfaction  within  the 
Arrred  Services. 

Previous  computer  experience:  Computer  experienced 
personnel  (ie.  lata  Processors)  are  expected  to 
have  a  better  appreciation  for  the  advantages  of 
voice  input  and  thus,  be  more  conscious  of  their 
efforts  and  positively  motivated  for  higher 
recognition  accuracy. 
—  Foreign  language  competency:  Frequently  military 
and  civilian  members  associated  with  ECD  are 
required  to  possess  the  capability  to  fluently  speak 
a  foreign  language.  This  ability  is  another  factor 
that  could  affect  one's  speech. 

b.   Operational  Characteristics 

This  set  of  parameters  examines  the  possible 
effect  on  recognition  accuracy  due  to  factors  surrounding 
the  operational  use  of  voice  recognition  equipment. 
Specific  characteristics  include: 
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—  TraioiDg  method:  Analysis  of  recognition  raxes  for 
those  users  who  are  supervised  during  the  training 
rrode  compared  to  tnose  who  are  allowed  to  train  the 
equipment  individually. 

—  Time  of  day  and  week:  A  determination  of  whether 
the  tirre  frame  in  which  a  speaker  trains  the 
recognizer  will  have  any  subsequent  affect  en 
recognition  accuracy. 

Equipment  experience:  Comparison  of  recognition 
rates  between  experience!  users  of  voice  recognition 
equipment  and  those  who  have  never  used  the 
equipment  before  ('naive'  users). 

Ease  of  use:  The  operational  simplicity  of  the 
equipment  could  affect  a  speaker's  performance.  For 
example,  a  speaker  who  considers  the  rccognizer  <=  s  a 
complex  and  operationally  difficult  device  will  te 
less  likely  to  devote  his  or  her  maximum  effort  to 
their  performance. 

c.   Personal  Characteristics 

The  following  are  various  characteristics 
considered  to  have  a  possible  effect  on  an  individual's 
speech  patterns,  and  hence,  affect  the  reccgnition  accuracy 
of  a  voice  system.   These  parameters  include: 


Race 

Marital  status  and   fanily   size:    A   correlate   of 


psychological  state  and,  although  equally  likely  tc 
fee  included  as  a  psychological  characteristic,  it  is 
considered  here  as  a  criterion  for  personnel 
selection,  iamily  size  refers  to  the  number  of 
offspring  the  user  has  as  opposed  to  the  size  farrily 
In  which  one  was  raised. 

—  Religious  preference/Ethnic  background 
Accent  or  dialect 

Piece  of  birth/geographic  origin 
Level  of  education 

—  Socioeconomic  class:  similar  in  nature  to  the 
characteristic  of  marital  status  but  is  considered 
for  its  rrerit  in  selection  of  personnel  than  for  its 
affect  on  individual  speech  patterns. 

rental  or  orthodontal  care:  Braces,  corrections  for 
irrproper  bite,  or  rr.ajor  oral  surgery,  are  considered 
for  their  implication  on  the  speech  patterns  of 
those  individuals  and  the  resultant  error  rate. 

d.   Physiological  Characteristics 

These  characteristics  ere  also  considered  to 
have  an  affect  on  speech  ana  as  a  result  are  factors  of 
interest  when  examining  recognition  accuracy  and  speaker 
variability.   These  parameters  include: 

—  Heignt 
Weight 
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—  Age 

—  Physical  condition:   A  subjective  evaluaticr  by   the 
user  of  his/her  current  physical  condition. 

Rate  of  airflow:  feasurerent  of  ventilatory 
function  to  provide  a  diagnosis  of  condition 
directing  voice.  This  measurement  can  also  ce  used 
as  an  indication  of  possible  airway  obstruction. 
Vital  capacity;  The  iraiimuir  arrcunt  of  volume  of  air 
which  can  be  exhaled  following  maxiirnm  inhalation. 
This  treasure  provides  an  estimate  of  the  amount  of 
air  potentially  available  for  the  production  cf 
phonation . 

Speech  training:   Examines  whether  formal  speech   or 
voice  training  affects  recognition  accuracy. 

e.   irsychoiogical  Characteristics 

The  current  psychological  state  cf  a  user,  their 
cocperativeness ,  and  their  personal  attitudes  toward 
automation  and  voice  ail  contribute  toward  the  overall 
affect  on  recognition  accuracy.  The  particular  parameters 
investigated  include: 

Psychological  anxiety 

Speaker  cooperativeness 

Affect  of  errors  on  subsequent  performance 

Altitudes  toward  voice  recognition   equipment   as   a 

tirre  saving  job  aia 
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Attitudes  towards  computers  and  data  automation. 

In  effect,  iter's  4-6,  are  related  to  speaker  cooperaxi veness 
in   that   now   a  user   feels   about   computers  ana  voice 
recognition  could  impact  on  their  willingness   to   reliacly 
support  the  use  of  voice  recognition  equipment. 
2 .   Constraints 

Accomplishnent  of  test  objectives  were  constrained 
within  the  research  facilities  of  the  Naval  Postgraduate 
Scnool.  In  the  interest  of  time,  experimentation  was 
limited  to  five  weeks  . 

Because  voice  production  is  an  extremely  complex 
event  in  whicn  auditory,  acoustic,  and  aerodynamic  events 
are  produced  dv  the  interaction  of  physiological  mechanisms, 
it  would  ce  beneficial  if  we  could  measure  as  many  vocal 
parameters  as  possible  in  order  to  achieve  a  complete  and 
accurate  picture  of  voice  production,  its  associated 
variabiiny  amciig  speakers,  and  its  correlate  to  voice 
recognition  accuracy.  Lack  of  equipment,  time,  and/cr 
expertise  precluded  examination  of  such  factors  as: 

—  Glottal  waveform 

—  Transfer  function  of  the  vocal  tract 

—  Scund-pressure  level 

—  Maximum  duration  of  sustained  phonation 

—  Maxinum  frequency  levels 

—  hcdal  frequency  level 
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5.   SUSJICTii 

Forty-four  subjects  participated  in  the  experiment  on  a 
volunteer  oasis.  The  group  was  composed  of  Zt  military 
officers,  1?  rrilitary  enlisted,  and  '<L  civilians.  The 
military  officers  representing  the  Army,  Air  .Force  and  Navy 
consisted  of  cl  rrales  and  4  females  while  the  enlisted 
personnel  representing  the  Army  ar.d  Navy  consisted  of  11 
Rales  ana  6  fenaies.  The  civilians  included  a  professor  from 
the  NPS  Oceanography  Department  and  an  employee  of  the 
Defense  Panpower  Lata  Center  (LKDC)  in  Monterey.  The  rank 
or  grace  of  tiie  military  subjects  ranged  frcm  G-£  tc  C-4  for 
the  commissioned  officers,  CWZ  to  C'#"2  for  the  Warrant 
Officers,  and  E3  tc  17  for  the  enlisted  personnel.  The 
subjects  ages  ranged  from  k£    to  47,  with  an  average  age   of 

It  was  desired  that  the  speakers  selected  for  the  test 
oe  representative  of  tne  population  for  which  the  recognizer 
is  to  ce  used,  in  cur  case  a  Command  and  Ccntroi  environment 
and  id  particular,  a  military  command  center.  Subjects 
taking  part  in  the  experiment  were  representative  cf  this 
environment  as  shown  by  the  grace  distribution  ana  types  of 
military  cccupatioi-al  specialties,  although  some  of  these 
specialties  are  not  readily  apparent  in  current  job 
description  (ie.  Medical  NCO). 

Twenty-five  cf  the  sucjects  were  frcm  Fort  Crd  and 
included   a   variety   of   backgrounds   such   as   pilots,  air 
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traffic  controllers,  signal  officers,  signal  non- 
commissioned officers  (NCO's),  and  infantry  platoon 
sergeants.  Jive  of  the  subjects  were  data  processors;  2 
from  the  fleet  Numerical  Oceanographic  Center  in  Monterey 
ana  3  frorr  aarrinist  rat  i  ve  offices  of  the  Naval  School. 
Twelve  subjects  were  students  at  NFS  and  enrolled  in  the 
Ccmrana,  Control,  ara  Corrnunica tiors  (C3)  curricula.  A  viae 
diversity  in  their  backgrounds  is  illustrated  by  previous 
job  categories  such  as  aviation,  communications ,  systems 
programming,  conrunicat ions  maintenance,  conmand  and  staff, 
and  nuclear  ent'ireering. 

Twelve  of  the  subjects  had.  experience  using  voice 
recognition  equipment,  having  participated  in  previous  voice 
experimentation  [Ref.  9].  A  summary  of  subject 
characteristics  is  provided  in  Tdble  III. 

C.   IQUIPtfINT 

1.   Voice  Recognition  System 


A  ThreshoJd  Technology  Inc.,  Model  T-600  voice 
recognition  systen  was  used  to  represent  a  corrmercially 
available,  state-of-the  art  recognizer;  one  which  has  been 
well  documented  as  to  its  reliable  recognition  accuracy. 
The  T-c'00  is  a  speaker  dependent,  isolated  word,  speech 
recognition  device  wnicn  automatically  recognizes  spoken 
'words  and  phrases.  These  words  and  fhrases  (utterances)  may 
te  as  brief  as  2.1  second  out  will  usually  range  from  0.25 
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TABLE  III 
SUBJECT  CHARACTERISTICS 


SEX 

Male:   34 

>err;ale:  12 


SERVICE 

Amy:   27 
Navy:   t 

Air 

i orce :   7 


LOCATION 


it  Orel: 

NPS: 

FNCC: 

EMEC: 


25 

16 
2 

1 


VOICE 


Experienced 
Users:    12 


Na  i  ve 
Users : 


32 


RANK 


OCCUPATIONAL  BACKGROUNDS 


0-4 
C-3 
0-2 
CV.3 
C'*2 
1-7 

1-5 
E-3 
CIV 


Pilots,:  2 


Air  Traffic  Controllers:  5 


Supply  Officer:  2 

Medical  MCO:  1 

Signal  MCO:  3 

Engineer  NCC:  1 

Professor:  1 


Lata  Processors:  5 

Mcaical  Officer:  1 

Signal  Officer:  3 

Finance  Officer:  1 

Operations  Officer:  1 

Computer  Systems  Manager:  1 

Graduate  Students:  12   (vnicn  include) 

Pilots:  3 

Communications  Officer:  2 

Communications  Maintenance  Officer:  2 

Systems  Programmer:  1 

WWMCCS  Prograrrner:   1 

Submarine  Nuclear  Engineer:  1 

Infantry  Unit  CotTirander:  1 

AUTODIN  Supervisor:  1 
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lo  1.0  seconds  and  must  be  separated  ty  very  short  pauses  of 
.1  second  or  rrore.  The  terminal  allows  a  user  to  begin  an 
utterance  "before  it  has  completed  processing  the  previous 
one,  tut  in  this  experimentation  rate  of  speech  was 
controlled  by  use  of  the  RIAEY  indicator  light  located  on 
the  tape  cartridge  unit.  This  light  indicates  wnen  the 
Terminal  is  ready  to  accept  the  next  utterance  in  both  the 
trailing  and  recognition  nodes  [Ref.  25 J  . 

Tne  Threshold  600  in  its  standard   configuration   is 
composed  of  the  following  four  elements: 

Terminal  consisting  of: 

analog  speech  preprocessor 

LSI-11  microcomputer 

digital  RS-^Sk:  input/ouput  interface 

Standard  CRT/Keyboard  Display  Terminal 

Remote  Voice  Input  Unit  ^Microphone  preamplifier) 

Tape  Cartridge  Unit 
The  terminal,  CRT  display,  microphone  preamplifier,  and  tape 
cartridge  unit  were  table  rrounted  (Figure  3)  within  an 
eccustic  sound  reduction  booth  (Figure  4).  A  conventional 
SHURE  model  SM-10  "boon"  microphone,  supplied  as  standard 
equipment  with  the  T-600  was  used.  The  microphone  possesses 
a  special  ccise  cancelling  design  vhich  allows  the  T-600  to 
perform  accurately  despite  most  extraneous  background  noises 
(Figure  I) . 
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figure  4.   Acoustic  Sound  Reduction  Chamber 
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The  speech  preprocessor  accepts  the  speech  signal 
input  from  the  microphone  preamplifier  and  passes  it  through 
a  spectral  analyzer  for  word  boundary  detection.  The 
feature  extractor  monitors  for  22  phoceticaily-relevant 
features,  ana  converts  these  to  digital  signals.  Words  are 
detected  from  occurrences  of  low  energy.  A  minimum  pause  of 
0.1  second  rust  occur  to  prevent  confusion  between  words. 
Any  breathing  ncise  at  the  eno.  of  the  word  is  removed.  The 
remaining  speech  is  divided  into  16  fixed  time  segments,  and 
features  are  reconstructed  ontc  the  normalized  16  segment 
tine  base. 

The  microcomputer  dres  a  comparison  of  input  signals 
against  stored  reference  patterns.  lach  word  is  represented 
by  512  (16  x  c'Z)  tits  of  information.  The  closest  fit 
Between  an  incoming  template  ana  the  alternative  storea 
training  template  is  fDund,  ana  that  'closest'  word  is 
declared  the  word  identity,  unless  the  score  is  so  low  that 
no  aecision  can  be  made  and  the  utterance  is  rejected 
outright.  The  vocabulary  reference  patterns  are 
established  by  the  subject  'training'  the  recognizer.  This 
is  cccomplisned  by  the  subject  making  a  set  number  of 
repetitions  of  the  various  vocabulary  utterances. 

Once  a  natch  is  found,  the  appropriate  character(s) 
are  sent  via  the  output  interface  to  the  CKT  to  indicate  to 
the  user  which  utterance  *as  recognized.  These  terminal 
matches  are   further   categorized  as  misreccgnit ions ,  where 
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the  terminal's  'closest'  match  to  tne  reference  vocabulary 
was  cot  precisely  the  sarre  utterance  spoken,  or 
recognitions ,  in  which  the  utterance  spoken  is  exactly 
recegiized  ana  so  reflected  in  the  CET  output.  Rejection  of 
an  utterance  is  a  tnird  category  and  is  indicated  by  an 
audible  'beep'. 

The  rerrote  voice  input  unit  allows  components  to  be 
reiroteiy  located  up  to  2000  feet  frorr  the  terminal  processor 
ana  provides  tee  rreans  to  aajust  the  volume  (amplification) 
cf  the  amplifier  tc  accommodate  the  normal  speaking  voice  cf 
each  particular  subject. 

The  tape  cartridge  unit  is  a  digital  tape  recorder 
used  tc  store  and  recall  application  data  and  an  individual 
subject's  vocabulary  reference  patterns.  Once  the  data 
cartridge  is  recorded  it  contains  all  the  information 
necessary  tc  initialize  the  Threshold  62£  terminal  for  each 
subject.  The  T-602  is  capable  cf  storing  a  2t6  vcrd 
vocabulary  whicn  rray  be  recorded  or  leaded  in  a  few  minutes 
using  the  tape  unit. 
c  .   Spirometer 

A  recording  spirometer,  Figure  6,  a  type  cf 
gasometer,  was  used  for  measuring  and  recording  vital 
capacity.  It  consists  of  a  metal  tank  containing  a  movable 
piston  wiLn  a  water  seal,  air  input  line,  exhaust  valve  for 
resetting,  ink  stylus,  and  revolving  cylinder  for  mounting 
chart  paper  calitrated  in  cubic  centimeters. 
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As  the  subject  breathes  into  the  mouthpiece ,  Figure 
7,  air  replaces  water  in  the  inner  piston,  vbicb  rises  fcj  an 
amount  proportional  to  the  exhaled  air.  The  subject,  once 
fitted  with  the  rrouthpiece,  is  given  instructions  to  inhale 
to  the  greatest  extent  possible  ana  then  exhale  ail  the  air. 
This  procedure  was  repeated  three  times  and  tbe  average 
vital  capacity  usee  for  analysis  purposes. 
3.   Feck  flow  iveter 

The  Wright  Peak  Mow  Meter  was  used  to  measure  the 
r.axiniUiT  air  flow  rate  in  a  single  forced  expiration.  The 
instrument,  Mgure  8,  consists  of  a  pivoted  vane,  the 
rotation  cf  which  is  opposed  by  resistance  cf  a  spring.  The 
plastic  mouthpiece  fits  into  the  radial  iniet  which  leaas  tc 
the  vane.  iittacbed  to  the  vane  is  a  spindle  and  pointer. 
The  forced  expiration  causes  the  vane  and  pointer  to  rotate 
until  tne  maximum  attainable  flew  nas  been  reacted.  Cnce 
reached,  the  pointer  is  neld  in  position  by  a  ratchet  until 
released  cy  a  reset  button  on  the  tack  of  the  device.  The 
scale  is  graduated  in  liters  per  minute  in  15  liters/minute 
divisions  over  a  range  of  68  to  1002  liters/r inute. 

Froceduraliy,  the  subject  stands  and  holds  the  meter 
in  a  vertical  plane  as  depicted  in  Figure  S.  He/she  then 
takes  as  deep  a  breath  as  possible,  places  the  mouthpiece  in 
the  mouth,  grips  it  tightly  with  the  teeth,  and  seals  it 
with  his/her  lips.  The  subject  blows  out  as  hard  as 
possible  in  a  short,  snarp  expulsion  cf  air.   Tbis  procedure 
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Figure   £.      The   '*rigiit   P*ak   Flew   reter 
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was  performed  three  times  with   the   average   notea   as   the 
appropriate  pea*  expiratory  flow. 
4.   Tape  Recorder 

An  AiAI  4&K0  DS  hk-II  magnetic  tape  recorder  was 
usea  for  the  recording,  storage,  ana  reproduction  of  speech 
sounns  (figure  10).  The  device  is  a  typical  analog  magnetic 
tape  recorder  consisting  of  thr^e  basic  parts.  These 
inciude  tie  electronics  of  the  system,  the  head  asseraly, 
and  the  tape  transport.  These  components  ta&e  a  phenomenon, 
such  as  the  speech  sound,  that  changes  in  time  ana  records 
it  as  a  continuous  event. 


Figure  13.   AKAI  Tape  Recorder 


Tapes  were  recorded  lor  all  44  subjects  during  their 
participation  in  the  experiment.  Subject  to  availability  of 
analytical  software  at  NFS,  further  acoustical  analysis 
could  be  conducted  on  speaker  variability  that  rright 
substantiate  and  support  statistical  conclusions. 

p.   INS1BUMENTATI0N 

Three  questionnaires  were  used  to  elicit  the 
evaluations,  judgement,  comparisons,  attitudes,  and 
background  history  of  the  subjects  participating  in  the 
experimentation.  The  first  tvc  questionnaires  were  designed 
[Her.  26]  to  proviae  the  necessary  information  to  delineate 
subjects  into  various  groups  representing  tbose  human 
factors  discussed  earlier.  The  third  questionnaire  was  used 
to  measure  state  and  trait  anxiety  levels  during  various 
periods  of  the  experiment.  The  questionnaires  were 
"author-administered"  in  order  to  provide  clarification,  if 
neeced,  ic  any  written  instructions  and  insure  that  all 
respondents  completed  the  questionnaires  correctly,  giving 
appropriate  consideration  to  each  iter. 

Three  types  of  questionnaire  items  were  used;  open- 
ended,  multiple  choice,  and  rating  scale.  The  open-ended 
items  permitted  the  subject  to  express  his/her  answer  to  the 
question  in  one's  own  words.  In  all  cases,  these  questions 
required  short  (one  or  two  words)  objective  replies.  The 
multiple   choice  questions  allowed  each  respondent  to  choose 
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tne  appropriate  answer  from  a  list  of  several  options. 
These  multiple  choice  questions  include  "dichotomous"  items, 
for  example,  those  requiring  only  a  YES  or  NO  response. 
Jiinelly,  rating  scale  iters  were  used  to  obtain  judgements 
or  attitudes  about  some  object,  coucept,  or  system.  These 
questions  permitted  the  assignment  of  various  response 
alternatives  along  an  unbroken  continuum  or  in  ordered 
categories  along  the  continuurr.  Beth  a  grapnic  scale, 
allowing  the  resj-Oudent  to  place  his/her  judgement  any  place 
along  the  line,  and  a  numerical  scale,  confining  the 
subject's  response  to  a  discrete  category  along  the 
continuum  v»ere  erplcyed. 

1 .  Jser  Questionnaire  #1 

User  Questionnaire  #1  (Appendix  A)  employs  a 
corroination  cf  question  items  including  open-ended,  multiple 
choice,  and  graphical  rating  scale  iterrs.  Questions  1-22 
are  designed  to  obtain  information  pertaining  to 
occupa tional ,  personal  and  physiological  characteristics. 
Questions  23-4tf  obtain  attitudinal,  comparison,  and 
evaluation  infornation  pertaining  to  occupational, 
operational,  physiological  ana  psychological 
characteristics . 

2.  User  Questionnaire  #2 

User  Questionnaire  #2  (Appendix  B)  utilizes  a 
comoinaticn  cf  question  items  including  multiple  choice  and 
graphical   rating   scale   items.    Questions   1-3   obtained 


information  relative  to  physiological  factors  i*hile 
questions  4-15  were  repetitious  items  from  user 
Questionnaire  #1  designed  to  obtain  attitudinal  information 
t'rop-  tne  subjects  alter  using  speech  recognition  equipment 
for  four  wee^s. 

3..  STAI  Questionnaire 

The  State-Trait  Anxiety  Inventory  (STAI)  is 
comprised  of  separate  self-report  scales  for  measuring  two 
distinct  anxiety  concepts:  state  anxiety  (A~State)  .and 
trait  anxiety  (A-Trait).  This  inventory  was  developed  by 
Spielberger  et .  al.  at  Vanderbilt  University  and  later 
continued  at  Florida  State  University.  It  was  reproduced 
with  the  special  permission  cf  the  Fuolisher,  Consulting 
Psychologists  Fress,  Inc.,  Palo  Alto,  California. 

The  STAI  A-Trait  scale  consists  of  20  statements 
(Appendii  C)  that  as£  people  how  they  generally  feel.  The 
A-State  scale  alsc  consists  of  £0  statements  (Appendix  C) 
but  the  insnuctions  require  subjects  to  indicate  bow  they 
feel  at  a  particular  moment  in  time.  The  STAI  was  designed 
to  be  self-administered  and  was  given  individually  to  each 
subject.  Complete  instructions  are  printed  en  each  test 
form,  for  coth  the  A-Trait  and  A-State  scales.  There  were  no 
time  limits  imposed  for  completion  cf  the  form.  Although 
many  cf  the  items  have  face  validity  as  measures  of  anxiety, 
the  inventory  was  referred  to  as  a  Self-Evaluation 
Questionnaire.    lach  subject  responds  to  every  STAI  item  by 


circling  the  appropriate  number  to  the  right  of  each  item 
statement  on  the  form.  Scoring  £eys  are  depicted  with  each 
scale  in  Appendices  C  end  13  [Ref.  i?7J  . 

E.  IXPZEirENTAL  DISIGN 

A  three-factor  rriied  design  with  repeatea  measures  on 
one  factor  was  employed  in  this  experiment.  In 
consideration  of  tne  wide  variety  of  human  factors  to  be 
examined,  the  experiment  was  designed  to  allow  an  analysis 
of  three  critical  factors  (occupational  experience  with 
microphones,  operational  training  method  and  experience) 
affecting  recognition  accuracy  while  simultaneously 
gathering  sufficient  data  to  accomplish  subsequent  analysis 
en  individual  characteristics  of  spea&ei  variability.  The 
two  between  variables  were  microphone  experience  and 
training  method,  The  third  factor,  experience  (Weeir#),  was 
the  within  group  variaoie.  A  summary  of  tee  experimental 
design  appears  in  Figure  11. 

F.  FRCCIIURI 

1.   Training 


ior  tne  T-600,  the  training  procedure  consists  of 
entering  10  passes  of  each  utterance  into  the  voice 
recognizer.  A  word  list  of  100  utterances  (Appendix  E)  was 
provided  the  subject,  each  utterance  prompted  on  the  CRT, 
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Eigure  11.   Experimental  Design 
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the  10  passes  spoken,  and  then  the  next  utterance  on  the 
list  i»ould  be  prompted.  Eased  on  the  experimental  design, 
subjects  were  divided  into  two  groups;  supervised  and  non- 
supervised.  Those  supervised  during  training  received 
detailed  instructions,  and  close  scrutiny  on  eacn  of  the  10 
passes  by  the  eiperiment  administrator.  If  the  subject 
failed  to  clearly  pronounce  the  utterance,  if  volume  level 
vas  .insufficient,  cf  if  the  required  .1  second  pause  was 
omitted.,  the  word  was  immediately  retrained.  Non-supervised 
subjects  received  the  same  instructions,  a  short 
demonstration  cf  the  training  procedure  and,  when  ready, 
were  allowed  to  train  the  equipment  individually  with  no 
supervision  by  the  experiment  administrator. 

Training  >.<as  accomplished  only  during  the  first  v^eek 
of  the  eiperirrert.  Subjects  trainirg  in  the  morning  (0730- 
122^  hours)  would  subsequently  test  during  those  periods  and 
likewise  for  those  subjects  training  in  the  afternoon 
(1400-1900  hours).  Immediately  after  training,  all  subjects 
made  at  least  twc  passes  cf  the  entire  100  word  vocabulary 
(similar  to  a  test  session)  to  identify  any  problems  in 
training  cf  a  particular  utterance.  If  the  utterance  was 
correctly  identified  on  both  passes  it  was  considered  as 
trained.  tcwever,  if  an  error  (either  rrisrecognition  or 
non-recognition)  occurred,  a  third  pass  was  made.  If  less 
tnan  twc  cf  the  three  passes  cf  any  utterance  was  correct, 
that  utterance  was  retrained. 
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After  the  equipment  was  trained,   each   subject   was 
measured   for   vital   capacity  and  pea*  flow  rate.   Finally, 
User  Questionnaire  #1  was  administered.   Total  time  for   the 
training  session  averaged  l.E  hours  per  subject. 
2.   Becogniticn  Testing 

.following  training,  subjects  were  tested  on  the 
system.  Each  subject  maae  2  passes  through  the  entire 
vocabulary  list  on  each  or  three  days  during  the  week. 
Duration  ct*  the  experiment  was  three  weeks.  During.  Week  #1 
the  vocabulary  list  remained  in  the  same  oner  as  during 
training  (.Appendix  E)  while  in  Week  #2  the  order  of  the 
utterances  were  reversed  (Appendix  E)  and  in  WeeK  #3  xhe 
order  was  randorrizea  (Appenaii  G).  The  purpose  of  this 
cnarge  in  vccatuiary  order  was  tc  reduce  the  effect  of 
learning  cue  to  repetit iveness ,  ana  thereby  proviae  a  more 
realistic  picture  of  speaker  variability.  Data  was 
conectea  in  tne  rem  or  recognitions,  rrisrecognitions ,  and 
non-recognitions  using  Appendix  H . 

The  STAI  questionnaire  for  A-State  scale  measurement 
was  administered  just  pricr  to  the  first  testing  session 
(Week  #1,  Trials  1-2)  to  determine  anxiety  levels  prior  to 
using  voice  equipment.  During  Week  #2  another  STAI 
questionnaire  for  A-State  scale  was  adrrinisterea  following 
tne  first  test  session  of  that  week.  The  final  STAI  form 
for  the  rreasurerrent  of  A-Trait  scales,  was  administered 
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during  toeek  #2.   User  Questionnaire  U'c   was  provided  to   each 
subject  at  the  conciusion  of  the  experiment . 
2.   Vocabu  iary 

It  was  desired  that  a  test  vocabulary  similar  tc  a 
vocabulary  intended  for  practical  application  in  a  military 
environment  be  used.  Of  concern  in  the  design  cf  the 
vocabulary  was  the  fact  that  brief  monosyllabic  words  are 
more  difficult  to  recognize  that  longer  polysyllabic  uords 
or  phrases.  A  relatively  equal  distribution  cf  words  and 
utterances  containing  a  syllabic  content  ranging  from  1  to 
>5  syllables  was  selected  as  the  final  vocabulary.  The- 
words  were  chosen  both  from  previous  experimentation  [lief 
£3]  and  the  author's  military  experience.  Appendix  I 
provides  a  listing  of  tne  100  utterances  used  in  the 
experiment  and  considered  as  representative  of  use  in  a 
military  command  center. 

G.   ?AHI ABIES 

The  dependent  variables  in  this  experiuent  were  total 
errors,  a  linear  combination  of  misrecognitions  ana  non- 
recognitions.  Independent  variables  in  the  overall 
experimental  design  are  experience,  job  function,  ana 
training  method.  Additional  independent  variables  included 
each  of  the  individual  human  iactor  characteristics  elicited 
earlier. 
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Lata  was  collected,  on  the  eleven  subjects  within  each 
group  of  the  experin;ental  design.  Each  subject  made  500 
utterances  per  wee*  for  a  grand  total  of  16)20  for  the 
experiment.  Total  utterances  tor  the  completed  experiment 
nurrcered  79, £02  (44  x  1800). 
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V.   ANALYSIS  AND  RESULTS 

A.   GENERAL 

All  analyses  were  performed  using  the  MINITAB 
statistical  package  IRef.  28].  Repeated  treasures  analyses 
of  variance  procedures  were  rerforr-ea  in  accordance  with 
guidance  provided  by  fruning  and  Kintz  [Ref .  2b>]  .  Non- 
parametric  tests  tor  significance  between  pairs  or  means, 
several  independent  samples,  and  for  trend  analysis  were 
conducted  utilizing  procedures  discussed  by  Conover  [Ref. 
30] .  Additional  parametric  analysis  followed  procedures 
prescribed  cy  Ctt  [Ref.  51J . 

All  mean  error  rates  th<it  appear  in  figures  are  of 
untranstorrrea  data.  Since  the  F  test  in  an  analysis  of 
variance  is  valid  even  w:.th  mi  la  departures  from  the 
assumption  of  equality  of  variances  [Ref.  31:  p.  63L']  , 
hartley's  Test  for  homogeneity  of  population  variances  was 
used  to  determine  whether  an  eitrere  case  (unequal 
variances)  existed  and  thereby  determine  if  a  transformation 
cf  aata  would  oe  required  to  stabilize  the  variances. 
Results  of  this  test  are  presented  in  Table  IV.  The 
assumption  of  equal  variances  is  the  basis  for  the  use  of 
untransf ormed  data  in  all  subsequent  analyses. 

Tne  correlation  coefficient  reported  herein  is 
Spearman's  Rhc .   Although  the  Pearson  Product  Moment 
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TABLE  IV 
TEST  FOR  EQUALITY  OE  VARIANCES 


TATA:   2 

£  (group  I)    =   1947.42 

c 

s  (group  id    =    3666.30 

2 
s  (group  III)  =   2625.62 
2 

s  '.group  IV)   =   5626.95 

HYPOTHESES: 

H0  :   All  population  variances  are  equal 

H,  :   Not  all  population  variances  are  the  sarr.e 

TEST  STATISTIC: 

2 
s 
Mai 

i     = =  2.895 

Mai      2 
s 
Min 

EECISICN: 

Levei  of  significance:   .05 

Tabulated  value  of  E    =  5.67 

Mai 

CANNOT  REJECT  THE  NULL  HYPOTHESIS 


correlation  coefficient  'r'  is  most  corrrnoniy  reported,  it  is 
however,  a  rardorr  variable,  ana  as  such  has  a  distribution 
function.  Cocover  [Ref.  30]  states  that  'r'  bas  no  value  as 
a  test  statistic  in  nonpararretric  tests  unless  the 
distribution  is  known. 
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B.   OCCUPATIONAL  CHARACTERISTICS 
1 .   Hypotheses 

The  following  hypotheses  pertaining  to  the 
occupational  characteristics  of  speakers  using  voice 
recognition  equipment  were  tested: 


a.  H0 :  Job  function  (microphone  experienced  users 
versus  non-microphcne  experienced  users) 
will  have  no  affect  on  recognition 
accuracy . 

H,  :    Job   function    (rricrophcne    experience) 
affects  recognition  accuracy. 


c.  H0  :  The  branch  of  service  the  rrilitary  member 
belongs  to  will  have  no  affect  on 
recognition  accuracy. 

L,  :   Recognition  accuracy  is  influenced   by   the 
branch  of  service  of  the  user. 


c.  si0  :  A  user's  attitude  pertaining  to  his/her 
present  job  satisfaction  will  nave  no 
affect  on  recognition  accuracy. 

H,  :    Job   satisfaction   affects   recognition 
accuracy. 


d.  t0:  The  degree  of  satisfaction  a  user  derives 
from  being  a  merrber  of  the  military  will 
not  affect  recognition  accuracy. 

L,  :    Serf  ice   satisfaction  has   an   affect   on 
recognition  accuracy. 


e.  H  :  Tne  amount  of  previous  computer  experience 
a  user  has  had  will  not  affect  recognition 
accuracy. 

H,  :    Previous   computer   experience   affects 
recognition  accuracy. 
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f.  H  :  Competency  in  a  foreign  language  (fci-  cr 
multilingual)  will  have  no  affect  on 
recognition  accuracy. 

H. :    Competency   in  a  foreign   language  will 
affect  recognition  accuracy. 


2.   Jot?  Function 

Tne  results  of  the  experiment  for  users  with  ana 
without  microphone  experience  are  shown  graphically  in 
iigure  12.  Microphone  experienced  users  fared  only  slightly 
better  than  non-rricrophone  experienced  users.  The  analysis 
cf  variance  tANCVA)  results  in  Table  V  substantiate  this 
snowing  an  F  ratio  of  .277  indicating  no  statistically 
significant  difference  in  the  user's  job  function.  Thus, 
the  null  hypotnesis  cannot  be  rejected. 
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Figure  12.   Mean  Error  Rate  vs.  Job  Function 
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TABLE  V 
ANALYSIS  CE  VARIANCE  BOB  RECOGNITION  ACCURACY 

SOURCE  SS        df       MS        i 


" 

TOTAL 

72296. 0e 

121 

-- 

- 

— 

BETWEEN  SUBJECTS 

54082.60 

42 

-- 

- 

— 

Micropnone 
Eiperience  (MIC) 

426.61 

1 

426.61 

.277 

NS 

Training 
Method  (TNG) 

5629.50 

1 

5629.50 

4.668 

*# 

MIC  I  TNG 

1759.69 

1 

1759.69 

1.521 

NS 

Error( b) 

4625e.60 

40 

1156.41 

- 

— 

WITHIN  SUBJECTS 

19212.41 

ee 

-- 

- 

— 

Trials  (TR) 

4224.19 

2 

2162.09 

ii. eye 

s**s 

TR  i  MIC 

12.50 

2 

6.75 

.037 

NS 

TR  x  TNG 

74.22 

2 

27  .,16 

.201 

NS 

TR  i  MIC  x  TNG 

12.  m 

2 

6.545 

.025 

NS 

E  r  r  o  r  ( w ) 

14788.40 

80 

184.65 

- 

— 

[  **  SIGNIFICANT  at  p  <  .05  ] 
[  NS  :  MOT  SIGNIFICANT  for  p  <  0,05  ] 


Microphone  Experience:   Experienced  vs.  Non-experienced 

Training  Metboa:   Supervised  vs.  Non-supervised 

Trials:  Week  #1  (Words  1-100) 
Wees  #2  (Words  100-1) 
Weei  #2  (Words  in  randotr  order) 
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Mean  total  error  rates  for  rricrophone  ana  non- 
microphone  experienced  users  is  summarized  in  Table  VI.  The 
definitive  decrease  in  error  rates  Dy  tirre  will  be  discussed 
later  in  the  review  of  operational  characteristics. 


TABLE  VI. 

MEAN  TCTAI  ERROR  EATIS  FOR  JOB  FUNCTION  BY  WEEKS 

(in  Percent) 


i 

MICROPHONE 

NO 

MICROPHONE 

i 
i 

i 

EXPERIENCE 

EXPERIENCE 

i 
i 

X 

WEEKS 

mi   #1 

1 
1 

7.B4 

7.78 

i 
i 

7.41 

!  WEEK  #c 

i 
i 

6.71 

1 
1 

6.47 

!  i»EEK  nz 

i 
i 

4.7y 

i 

i 

5.09 

i  X   JOB 

i 

i 
i 

!  FUNCTION 

i 
i 

6.02 

e.ez 

i 
i 

6.32 

3.   Branch  of  Service 

Three  Drenches  of  service  were  represented  in  the 
experiment  with  civilian  subjects  categorized  as  a  fourth 
oranch.  A  KrusKal-Wallis  test  for  k  >  2  sarrples  was  used  to 
determine  if  any  differences  existed.  Table  VII  provides 
the  synopsis  of  results.  The  cull  hypothesis,  that  branch 
of  service  will  not  affect  recognition  accuracy,  is  clearly 
rejected.  Multiple  comparisons  were  trade  to  determine 
between  which  pairs  of  rreans  the  differences  occurred.  The 
results  of  this  test  indicated  significant  differences 
between  Army/Navy  and  Army/Air-Eorce .   Differences  between 
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Civilian/Arrr,y ,    Civilian/Air-Eorce ,    Civilian/Navy    and 
Xavy/Air-Eorce  were  not  significant. 

Further  inspection  of  these  results  indicated 
possible  confounding  due  tc  experience  with  voice 
recognition  equipment.  All  Air  Force  personnel  and  2  cut  of 
£  Navy  personnel  v*ere  experienced  users.  Segregating  the 
experienced  ana  naive  users  into  separate  categories  and 
tnen  reconducting  the  analysis  fcr  affect  by  brarch  of 
service  snowed  nc  statistical  significance  (Table  VII). 
Using  the  original  hypotheses  established,  the  null  cannot 
be  rejected  in  either  the  naive  only  or  experienced  only 
cases.  I^ean  error  rates  by  branch  of  service  for  ell,  naive 
only  and  experienced  only  subjects,  are  presented 
graphically  in  Tigure  13. 

TABLE  VII 
AI'-iECT  BY  BRANCH  Oi  SER7ICE 


i 
i 

ALL  SUBJECTS 

i 

NAIVE 

i 

EXPERIENCE]) 

Type  of 
Test 

i 
i 
i 
i 

Kruskal- 
Wallis 

i 
i 
i 

i 

Kruskal- 

Wailis 

1 

1 
1 

Krusfcal- 
Wailis 

Alpha 

i 

i 

.05 

i 
i 

.05 

i 
i 

.05 

Test 

Statistic 

i 
i 
i 
i 

11.90  ** 

i 
i 
i 

2.79 

i 
i 
i 
i 

'2^ 

Critical 
Level 

i 
i 
i 

i 

.efe:75 

i 
I 
i 
i 

.25 

i 
i 
i 
i 

.90 

**  =  Sig 

nificant  at  stated 

level  of 

•  c 

ignif icance 
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figure  13.   f.ean  Error  Rate  vs.  Branch  of  Service 

4.   Jot  and  Service  Satisfaction 

Subjects  were  divided  into  four  groups   based  upon 
their  subjective  responses  ana  included: 

a.   Persons  who  disliked  tneir  jots 

d.   These  who  were  borderline  or   neutral   in   their 
feelings 

c.  Individuals  wno  liiced  their  present  job 

d.  lersons  who  indicated  a  very  definite  lining   of 
their  job  —  liked  tneir  job  very  rruch 

The  attained  test  statistic  (Table  VIII)  leads  to  the 
decision  that  the  null  hypothesis  cannot  be  rejected.  The 
correlation  coefficient  between  the  two  variables  was  not 
significant  and  it  is  concluded  that  there  is  no  apparent 
correlation  between  the  satisfaction  a  user  bas  for  bis/her 
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TABLE  VIII 
AJIECT  BY  JOB/SERVICE  SATISFACTION 


i  JOB  SATISFACTION 


SERVICE  SATISFACTION 


Type  of  Test  i   KruskalrWallis 


Xruskal-Wallis 


Alpfc. 


.05 


.05 


Test 
Statistic 


4.60 


.2iy 


Critical 
Level 


.20 


.90 


Correlation  j 

Coefficient  i 


.016 


.041 


**  =  Significant  at  stated  level  of  significance 


current  jcc  and  how  well  that  user  will  perform  witn  voice 
recognition  equipment.  This  particular  human  factor  is 
nevertheless  worthy  of  further  examination  in  the  future  in 
terms  of  users  whcse  current  jet  entails  the  day  tc  day  use 
of  voice  equipment  . 

In  the  analysis  of  the  affect  service  satisfaction 
nas  en  recognition  accuracy,  the  2  civilians  were  removed 
irom  tne  sample  population.  Subjects  were  now  divided  into 
three  groups  tased  upon  their  subjective  responses  and 
included: 

a.  Those  who  are  unsatisfied  or  don't  care 

b.  Those  who  are  reasonably  satisfied 

c.  Those  who   are  very   satisfied   with   their 
respective  service 


y0 


The  test  statistic  (Table  VIII)  reveals  no  significant 
difference  between  groups  ana  therefore  the  null  hypothesis, 
that  the  degree  of  satisfaction  a  speaKer  derives  fror  being 
in  the  arrred  services  will  not  affect  recognition  accuracy, 
cannot  be  rejected.  Correlation  between  service 
satisfaction  and  total  error  rates,  as  before,  was  not 
significant,  thus  indicating  little  or  nc  correlation 
between  the  random  variables. 

5 .   Previous  Cctrputer  Experience 

Subjects  were  subjectively  divided  into  four  groups 
Dased  upon  their  response  tc  questicn  #32  in  User 
Questionnaire  #1   ard  included  persons  with: 

a.   Nc  experience 

D.   Very  little  experience 

c.  Serve  or  rode  rate  experience 

d.  Considerable  experience  (data  processors) 

The  analysis  trcviazd  a  test  statistic  (Table  II)  which 
resulted  in  the  rejection  of  the  null  nypothesis  and  the 
conclusion  that  previous  corrputer  experience  will  affect 
recognition  accuracy.  Multiple  conparisons  were  performed 
to  determine  which  pairs  of  means  differed.  Significant 
differences  occurred  between  users  with,  no  and  considerable 
experience,  very  little  and  mcaerate  experience,  and  very 
little  and  considerable  experience.  These  results 
demonstrate  that  possession  of  experience  with  dataAeybcard 
input   procedures   provide   a   higher   recognition  accuracy. 
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explanation  for  this  occurrence  may  be  attributed  to,  for 
example,  a  data  processor's  awareness  of  the  tirre  involved 
for  manual  entry  and  the  associated  error  rate  as  well.  The 
advantages  that  voice  input  offers  to  those  computer 
experienced  personnel  may  well  be  a  psychological  or 
motivational  factor  in  addition  to  its  presence  as  an 
occupational  characteristic. 

These  results  are  further  substantiated  by  the 
computed  correlation  coefficient.  Performing  a  one-tail 
test  for  negative  correlation  wixh  the  existence  of  mutual 
independence  as  tne  null  hypothesis,  we  were  able  to  reject 
tills  h/pothesis  and  conclude  that  as  computer  experience 
increases,  recognition  error  rates  will  decrease  (Critical 
Level:  «  .001;.  Graphical  representation  of  mean  error 
rates  for  the  four  groups  are  shown  in  Figure  14. 

TABLE  IX 

AFFECT  CF  COMPUTER  EXPERIENCE 


COMPUTER  EXPERIENCE 


Type  of  Test      J 

_________ — ______  _.^ 

Alpha         i 


_rus_al-Wallis 

0.05 


Test  Statistic     | 

+. 


14.287  ** 
<  .005 


Critical  level     | 


Correlation 
Coefficient 


-.516  ** 


**  =  Significant  at  stated  level  of  significance 


y_ 


MEAN  % 

IfiBCE 

RATE 


6.0  - 
7.0  - 

e.0  - 

5.0  - 
4.2  - 

2.0  - 
2.0  - 
i.e  - 


(7.62)     (7.87) 


None 


^5.63) 


(3. £6 


Very 
Little 


Moderate   Considerable 


Figure  14.   Mean  Error  Rate  vs.  Computer  Experience 

6.   foreign  Language  Competency 

Recognition  accuracy  was  ccrrpared  between  two 
groups,  these  with  a  fluent  proficiency  in  a  roreign 
language  ana  those  without.  32  subjects  possessed  no 
capability  in  a  seccna  language,  wnereas  11  were  competent 
in  one  or  more  languages.  The  median  total  error  rate  for 
both  groups  was  6.28%.  A  two-sample  non-parametric  test, 
the  hann-Whitney ,  was  performed  tc  detect  the  existence  of 
any  differences  between  the  two  groups.  The  computed  test 
statistic  (TaDie  X)  clearly  shows  no  significance  at  the  .05 
level  ana  therefore,  the  null  hypothesis  cannot  be  rejected. 
The  critical  regions  for  this  twc-tail  test  included  values 
of  the  test  statistic  less  than  672  or  greater  tnan  814.8. 
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TABLE  X 
AFFECT  01  COMPETENCY  IN  ANOTHER  LANGUAGE 


FOREIGN  LANGUAGE 


Type  of  Test     ,          Mann-Whitney 
+ 

i 


Alpha         |  e.05 


Test  Statistic     J  754.5 

_______  —  —  —  —  __  _  __  _  — ._-!-_____________  —  ___—_. 

Critical  Level     !  .'6776 


>F*  =  q 


Significant  at  statea  level  of  significance 


C.   OPERATIONAL  CHARACTERISTICS 
1.   Hype  theses 

The  following  hypotheses  apply  to   the  operational 
characteristics  unuer  which  the  sutjects  were  tested. 


a.  h0 :  The  rcethod  of  training  a  user  for  voice 
recognition  operation  (supervise!  versus 
non-supervised)  will  not  affect  recognition 
accuracy . 

R, :   Method  of  training  will  affect   recognition 
accuracy 


d.  R  :  The  tire  of  day  in  which  a  user  trains  the 
equipment  will  Dot  affect  recognition 
accuracy. 

H, :  Recognition  accuracy  of  the  user  will  be 
affected  by  tne  tire  of  day  in  which  he/she 
trains  the  voice  recognizer. 
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c.  H0:  Tne  period  or  the  week  in  wbich  the  user 
trains  the  equipment  wiii  not  affect 
recognition  cccuracy. 

H, :  The  period  of  the  week  In  which  the 
equipment  is  trained  wiii  affect 
recognition  accuracy. 


d.  H0 :  Experienced  users  will  acquire  the  same  or 
greater  error  rates  than  Inexperienced 
(naive)  users. 

E,:   Experienced  users   wiii   have   icwer   error 
rates  than  naive  users. 


Ho  = 


H,l 


Recognition  accuracy  wiii  not  be  affected 
by  weekly  experience. 

A  user  wiii  demonstrate  reduced  error  rates 
(decreasing  trend)  as  experienced  will 
voice  recognition  equipment  increases. 


e. 


o* 


The  operational  ease  with  which  voice 
recognition  equipment  may  be  used  will  have 
no  affect  en  recognition  accuracy. 


Ease  Gi 
accuracy. 


use  will   affect   recognition 


2.   ^ethed  of  Training 

The  results  of  the  experiment  for  users  receiving 
either  supervised  or  non-supervised  training  are  depicted 
graphically  in  Figure  15.  Users  who  received  supervision  in 
the  training  rode  fared  significantly  better  than  those  who 
did  not.  The  analysis  of  variance  table  (ANOVA)  in  Table  V 
substantiate  this  claim,  providing  an  I  ratio  of  4. £66  and  a 
critical  level  ci  approximately  .235.  Thus,  the  null 
hypothesis  is  rejected  and  we  may  conclude  that  the  method 
of  training  does  affect  recognition   accuracy.    Mean   total 
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errcr      raies      for      supervised     and      non-supervised   users   are 
suirrcarized    in   Table   XI. 


MEAN  % 

ERROR 

RATI 


6.6!  - 

7.0  - 

e.e  - 

5.e  - 

4.0  - 
2.0  - 
2.g  - 

1.0  " 
0.0 


Supervised 

Training 


(7.41) 


Non-Supervisea 
Training 


Figure  15.  Yean   Error  Fate  »s.  Training  Method 


TABLE  XI. 

MIAN  TOTAL  ERROR  RATES  FOR  MITHCE  CE  TRAINING  EY  WEEKS 

(.in  Percent) 


SUPERVISED    i  NON-SUFERVISED 


TRAINING 


TRAINING 


X  WEEKS 


'aEZK    41 

i 
i 

6.21 

i 
i 

£.64 

i 
i 

7.41 

WEEK   re 

i 

t.ZZ 

i 
i 

7  ,es 

1 
1 

6.47 

WEEK    #2 

i 
i 

4.17 

i 

i 

6.00 

i 
i 

5.09 

1      JOB 

i 
i 

i 
i 

i 
i 

JUNCTION 

i 
i 

5.2b 

i 
i 

7.41 

i 

i 

e.Zc 
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3.   Tire  of  Bay  ana  Week 

Subjects  were  blocked  Dy  time  of  day;  morning  and 
afternoon,  cird  by  tine  of  wee*;  early  (Monday-Tuesday),  mid 
(Wednesday-Thursday)  or  iate  ( Friday-Saturday ) .  A  Mann- 
Whitney  test  was  performed  to  determine  if  differences 
existed  between  the  two  time  of  day  groups.  Morning  users 
nad  a  median  error  rate  of  5.1%  while  afternoon  users  bad  a 
6,67%  error  rate.  Because  of  equal  sample  sizes,  a 
parametric  i;-test  was  performed  to  confirm  results  of  the 
-ion-pararetr:.c  test.  The  presented  in  Table  III  will  not 
allow  us  to  reject  the  null  hypothesis.  Critical  regions 
for  the  Pan.i-Whitney  test  included  values  of  the  test 
statistic  le:;s  than  411.5  and  greater  than  576.5. 

With  three  groups  in  the  time  of  week  variable,  the 
analysis  utilized  the  Kruskal-Wallis  test  for  determination 
of  differences  among  the  groups.  The  null  hypothesis  cannot 
ce  rejected  with  a  test  statistic  less  than  5.S9,  for  the 
Chi-square  value  with  two  degrees  of  freedom.  The 
correlation  coefficient  was  found  to  be  significant  at  the 
0.05  level  in  a  test  for  negative  correlation.  A  premature 
conclusion  tnat  training  occurring  in  the  latter  portion  of 
the  wees  would  yield  lower  error  rates  appeared  to  be 
counter-intuitive.  It  was  thought  that  fatigue,  and 
interruption  of  a  weekend  would  result  in  poorer  training 
efforts  and  hence  lead  to  higher  errcr  rates  in  the  future. 
Upon  further  analysis,  this  reversed  correlation   was   found 
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to  be  the  result  of  possible  confcuDding  arising  from  tiie 
large  number  of  experienced  users  who  trained  in  the  later 
period  of  the  ween.  Eight  out  of  thirteen  late  weei:  users 
were  experienced  and  with  their  rerroval  frorr  consideration, 
the  correlation  between  time  oi  ween:  ana  total  error  rate 
became    statistical!/   non-significant. 


TABLE  XII 
AEEZCT  OF  TINE  OE  DAY  AND  WEEK 


i 

i 

i 

TIME 

0* 

DAY 

TlfE  CJf  WEEK 

i  ~ 

iT^pe  of  Test 

i 
i 

Ka 

nn-Whitney 

i 

i 

t-test 

Kruskal-Waliis 

Alpha 

• 
i 

tf  .tfb 

i 

i 

£.^5 

0.05 

i     Test 

i 
1 

i 

Statistic 

t 
1 

469 

i 

i 

-1.16 

4.14 

i   Critical 

i 
i 

i 
i 

i    Level 

i 
i 

.275 

i 
i 

't  C  -I 

.  c  ~  t~ 

.25 

!  Correlation 

i 

i 

!  Coefficient 

i 
i 

.Z93 

i 
i 

.093 

-2.67  ** 

i    V7?   =  Sig] 

aif 

ic 

ant  at 

sta 

ted 

level 

of 

significance 

4.   tiger  Experience 

Two  sets  of  hypotheses  iD  Section  V.C.I. d  are 
inccrporetea  into  this  phase  of  the  analysis.  The  analysis 
oi  the  first  set  was  performed  using  the  Nann-Whitney  test 
and  the  associated  results  are  summarized  in  Table  XIII. 
The  median  error  rates  for  naive  users  was  7 .26*  wnile 
experienced  users  attained  a  2.7b%  error  rate.   Both  groups 
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bad  equal  numbers  of  supervised  and  unsupervised  users,  The 
correlation  coefficient  yielded  one  cf  the  strongest 
correlations  between  two  variables  within  the  experiment. 
Ihe  null  hypothesis  can  be  rejected  and  it  is  therefore 
concluded  that  experience  will  affect  recognition  accuracy. 


TABLE  XIII 
AFFECT  Z\Jl    TC  USEE  EXPERIENCE 


EXPERIENCE 


Type  of  Test      ! 
(.. 


Mann-Whitney 
2.05 


Alpha 


Test  Statistic 


S69.0  ** 


Critical  Level 


<  .0(201 


Correlation 
Coefficient 


-.5yy  ** 


**  =  Significant  at  stated  level  of  significance 


The  analysis  of  the  second  hypothesis  of  V.C.l.d  is 
depicted  graphically  in  figure  16,  (Trials  by  Job  Function) 
and  Figure  17  (Trials  by  Training  Method).  In  each  case  no 
interaction  is  present,  with  the  weeKly  error  rate  showing  a 
steady  drop  of  approximately  .£  to  1.4%  eacn  week.  This 
graphical  interpretation  is  proven  statistically  in  the 
ANOVA  presented  in  Table  V.  That  is,  the  F  ratio  is  well 
above  the  3.11  required  for  a  level  of  significance  of  0.05. 
The  null  hypothesis  is  rejected  and  it  is  concluded  that 


yy 


MEAN  % 

ERRCR 

RATE 


8.0 
7.0 
6.0 
E.0 

4.0 

2  .0 

1.0 

0.0 


(7.04) 
(6.23) 

(4.79) 


Microphone 
Experience 


(7. 78)   Week  #1 
(6.7]  )  Wee&  #2 

(S.3y)  Wee*  #3 


No  Microphone 
Experience 


Eigure  16.   Trials  versus  Job  Function 


MEAN  % 
EHBCfi 

RATE 


6.0  - 
7.0  - 

e.0  - 

5.0  - 

4.0  - 
3.0  - 

2.e  - 
1.0  - 

0.0 


Supervised 
Training 


(£.61)   Week  #1 
(?.c'3)   WeeK  #2 

(6.00)   WeeK  #3 


Non-Superviseo 
Training 


Eigure  17.   Trials  versus  Training  Method. 
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users  will  improve  (reduce)  their  errcr  rates  through  weeiUy 
iteration.  This  conclusion  was  further  verified  cy 
application  of  the  Ccx  ana  Stuart  Test  tor  Trend.  The 
following  comparisons  were  made  cetween: 

a.   Week  #1  and  Week  #2 

t>.   Week  #£  and  Wee*  #»j 

c.   Week  #1  and  deek  #2 
In  ail  tnree  cases,  the  null  hypothesis,  that   there   is   no 
downward  trend,  was  clearly  rejected. 


5.   Zase  of  Use 

lased  en  subjective  responses  by  those  participating 
in  the  experiment  four  groups  viere  categorized.  They 
include: 

a.   Users  who  consider  voice   recognition   equipment 

difficult  tc  use. 
d.   These  who  had  no  opinior  either  way. 

c.  Users  v»hc  stated  tnat  vcice  equipment  is  easy  tc 
use . 

d.  These  who  feel  tbat  voice  recognition   equipment 
is  very  easy  to  use. 

The  results  of  tnis  analysis  are  summarized  in  Taole  XIV. 
The  test  statistic  is  less  than  the  Chi-square  value  of 
9.4cc  with  three  degrees  of  freedorr  and  therefore  the  null 
cannct  oe  rejected.  The  computed  correlation  coefficient  is 
not  significant  at  the  0.0£  level. 
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TABLE  IIV 
ASEECT  DUE  TO  EASE  CE  USE  GE  VOICE  EQUIPMENT 


j             EASE  OF  USE 

Type  of  Test     |          KrusKal-We  His 

Alpha         !              0.0S 

Test  Statistic    !              4 .814 

Critical  level     }              >  .25 

Correlation 

Coefficient      j               .157 

*-?   ■  Significant  at  stated  level  cf  significance 

D.   PERSONAL  CHARACTEEISTICS 

1 .   cypoxne ses 

The  following  hypotheses  were  tested  pertaining 
the  oerscnai  characteristics  of  vcice  recognition  users: 


to 


a . 


Race  of   the  user  will 
recognition  accuracy. 


not 


affect 


5t  ;   A  difference  in  recognition  accuracy  exists 
between  users  of  different  race. 


o . 


-o 


I  ' 


The  marital  status  of   the   user 
affect  recognition  accuracy. 


will   not 


A  user's  marital  status  will  have  an  affect 
on  his/ner  recognition  accuracy. 


H 


o  • 


H 


Size  of  a  user's   family  will   not  affect 
recognition  accuracy. 

Family   size  will   have  an   affect   on 
recognition  accuracy. 


Vi'c 


c.  ti0 :  The  religious  preference/background  of  a 
user  will  Have  no  affect  on  bis/her 
recognition  accuracy. 

Hj  :   A  user's   religious   pref erence/bacKground 
will  affect  recognition  accuracy. 


d.   H0 :   A  person's  accent  will  not   affect   his/her 
recognition  accuracy. 

H, :   Accent  affects  recognition  accuracy. 


e . 


The  place  of  birth  of  a  user  will   have   no 

affect  en  recognition  accuracy. 

One's  place  of   Dirth  affects   recognition 
accuracy . 


o  • 


\  ' 


Tne  geographic  origin  of  a  person  will   not 
affect  nis  or  her  recognition  accuracy. 

A  person's   recognition  accuracy  will   te 
affected  by   geographic  origin. 


The  level  of  education  an  individual  has 
attained  will  not  affect  his/her 
recognition  accuracy. 


Education   level   of 
recognition  accuracy. 


user 


affects 


o  ■ 


The  Soc  io-econorric  class  of  a  user  will  not 
aifect  recognition  accuracy. 

A  user's   recognition  accuracy  will   be 
affected  by  socio-economic  class  standing. 


n. 


H, 


Past  oral-surgery  or  orthodontal  care  will 
net  affect  recognition  accuracy  of  the 
user. 

Recognition  accuracy  of  the  user  will  be 
affected  if  he  cr  she  has  undergone  oral 
surgery  or  orthodontal  care. 
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2 .   B  a  c  e 


Twc  racial  DacKgrcunas  were  represented  in  the 
sarrplea  population.  Thirty-eight  Caucasian  ana  six  Negro 
subjects  participated  in  the  experimentation.  The  nedian 
total  error  rate  for  Caucasian  personnel  was  6%  end  6.6*  for 
Negro  users,  A  ivann-Whi  trey  test  was  performed  tc  detect 
tne  presence  of  any  difference  between  the  two  groups.  The 
calculated  test  statistic  (Tatle  XV)  was  net  significant  at 
the  .05  level  and  the  null  hypothesis  cannot  fee  rejected. 
Critical  regions  for  the  test  statistic  in  this  two-tail 
test  were  values  less  than  7y?  ana  greater  than  yie . 

TABLE  XV 
AlilCT  05  RACI  CM  RECOGNITION  ACCURACY 


!  RACE 

Type  of  Test      !  .^ann- Whitney 

Alpha         i              0.05 
(. 

Test  Statistic     !               84^.0 
+ 

Critical  level     !  .6941 


5?  5? 


=  Significant  at  statea  level  of  significance 


3.   Marital  Status  ana  iarriiy  Size 

The  satrple  population  consistea  of  14  single,  H5 
rrarriea,  2  aivorcea,  ana  £  ether  (separated,  wiaowea) 
personnel.  A  KrusKal-Wallis  test  for  k  >  2  samples  was  usea 
to  leterrrine  if  any  differences  in  means  existed  between  the 


lfc4 


groups,   because  the  computed  test  statistic  (Table 


v  V  7  N 


is 


less  than  7 .Sib,  the  tabulated  chi-square  value  with  3 
degrees  of  freedom,  the  null  hypothesis  cannot  be  rejected. 
No  correlation  coefficient  was  computed  for  marital  status 
due  to  the  noiribal  scale  of  measurement. 


TAfiLZ  XVI 
AiiSCT  Oi  MARITAL  STATIS  ANE  iAMIT  SIZE 


!    fAHITAL  STATUS 


FArllY  SIZ3 


Type  of  Test  j    Kruskai-Wallis    I    Kruskai-Wallis 


Alpha 


Test 

Statistic   ! 

+ 


2.61 


219 


Critical    , 
Level     | 


>  .3 


Correlation  J 
Coefficient  I 


UA 


.043 


**  =  Significant  at  stated  level  of  significant 


The  sample  population  subdivided   into   five  groups 
for   family   size   with  a  range  from  no  children  to  subjects 


having  four  or  rore  children.   A   Kruskal-Wailis   test 


va 


again   used   tc   determine   if   a   difference  existed  and  as 

before,   the   null   hypothesis   cannot   ne   rejected.  The 

computed    ccrrelaticn    coefficient     indicates  mutual 

independence  between  family  size  and  total  error  rate  of   a 
voice  recognition  user. 
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4.   ?eligicus  Preference 

Although  a  diverse  variety  of  religious  preferences 
i*ere  enumerated  by  participating  subjects,  scrre  were  pccle* 
to  preclude  nurerous  samples  sizes  of  just  one  person.  vor 
example,  Methodist  and  Episcopalian  were  combined  into  the 
Protestant  category  and  so  forth.  In  all,  sis  groups  were 
represented  and  included  Catholic,  Protestant,  Jewish, 
Baptist,  No  Preference  and  Others  (these  who  could  not  be 
readily  grouped  into  one  of  the  aforementioned  categories). 
Using  the  Kruska  1-Wallis  test  to  check  for  differences 
between  means,  the  obtained  test  statistic  (Table  XVII)  does 
not  allow  for  the  rejection  of  the  null  hypothesis. 
Therefore,  it  vay  be  concluded  that  the  religious  preference 
of  the  user  will  not  affect  his/her  recognition  accuracy. 

TABLE  XVII 
AF5ICT  OF  HILIGIOTJS  FPEEEBENCE 


RELIGIOUS  PHE]?EHiNCE 


Type  of  Test 


Kruskal-Wallis 


Alnha 


0.05 


Test  Statistic 


3,25 


Critical  Level 


**  =  Significant  at  stated  level  of  significance 
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5.   Accent 

Ten  subjects  possessed  sorre  type  of  noticeable 
accent,  as  determined  by  the  subject  and  experiment 
administrator.  Seven  were  Southern  and  three  were 
categorized  as  Other  (Spanish,  Bostonian).  Remaining 
subjects  were  placed  in  a  'No  Accent'  group.  The  resultant 
test  statistic  (Table  XVIII)  was  slightly  less  thai  the 
tabulated  Chi-square  value  cf  5.991  with  two  degrees  cf 
freedorr.  As  such,  the  null  hypothesis  cannot  be  rejected. 
An  additional  checK  was  accompli  shed  by  combining  the  tv*c 
accent  groups  into  one  generic  entity  and  performing  a 
^ann-Whitney  test  uc  detect  a  difference  between  the  two 
groups.  Again  the  null  hypothesis  cannot  be  rejected  at  the 
stated  level  of  significance.  Correlation  analysis  wes  not 
performed  due  to  the  nominal  scale  cf  measurerren  t . 

TABLE  XVIII 
AFFECT  OF  ACCENT  ON  RECOGNITION  ACCURACY 


ACCENT 

(3  croups) 


ACCENT 
(2    ?roups) 


Type  of  Test  J    Kruskal-Wallis    !     mann-Whi tney 


Aloha 


0  5 


.05 


Test 
Sta ti stic 


.73 


734 


Critical 
Level 


05 


c  e 


.09 


**  =  Significant  at  stated  level  cf  significance 
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Although  the  null  is  not  rejected,  the  critical  level  is 
sufficiently  close  to  the  stated  level  cf  significance. 
Thus,  irean  error  rates  are  illustrated  in  Figure  IS  for 
further  examination . 


_  i 


MIAN  % 
EBBC1 

RATE 


11. e 

le.e  - 

9.0  - 

6  .0  - 

7.e  - 

e.z  - 

5.0  - 

4.e  - 

3.0  - 

2.0  - 

i.e  - 

0.0 


r(ll.4) 


(5.73) 


77.C3^ 


No  Accent 


Southern 


Cther 


Figure  18.   Mean  Irrcr  Bete  vs.  Accent 

6.   Place  cf  Birth  and  Geographic  Origin 

Subjects  were  asKed  to  provide  their  state  of  birth 
and  their  responses  were  subsequently  classified  into  one  of 
the  following  six  generic  groups: 

a.  Overseas 

b.  Northeast  United  States 
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c.  Southeast  United  States 

d .  Mid-Central  United  States 

e.  Southwest  United  States 

f.  Western  United  States 

Applying  the  Kruskal-Wall is  test  to  the  corrpiied  data,  the 
obtained  test  statistic  (Table  XIX)  is  insufficient  tr 
reject  the  stated  null  hypothesis. 

Because  a  person's  place  of  birth  is  not  necessarily 
the  environment  in  which  that  individual  ^rew  up  in  (ie. 
during  ages  2-18),  data  pertaining  to  geographic  origin  was 
also  tested  to  determine  if  any  negative  affect  would  be 
encountered.  The  geographic  areas  used  were  the  same  as 
place  cf  birth.  Calculated  results  point  to  the  same 
conclusion;  the  null  hypothesis  of  Section  V.D.l.e.  cannot 
te  rejected. 

TABLE  XII 
AJEECT  CF  PLACE  OJ  BIRTH  AND  GEOGRAPHIC  ORIGIN 


!    PLACE  of  BIRTH    j   GEOGRAPHIC  ORIGIN 

Type  of  Test     KrusKa 1-Walli s    !    KrusKa 1-Wa 11  is 

Alpha              .25          !         .25 

Test      !                   i 
Statistic   !        5.32         !       4.0y 

Critical    j                   ! 

Level           >  .25         |      >  .25 

**  =  Significant  at  stated  level  of  significance 
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7.   level  of  Education 

The  sampled  population  partitioned  into  the 
following-  five  categories: 

a.  High  School  graduates. 

b.  Individuals  with  1  to  4  years  cf  college  but   nc 
degree  . 

c.  College  graduates. 

d.  Individuals  working  toward  a  graduate  decree. 

e.  Persons  accorded  a  graduate  degree   such   as   a 
Masters  cr  Doctorate. 

The  data  obtained  from  the  five  groups  was  tested 
for  any  significant  difference  between  groups.  The  te^t 
statistic  (Table  XX)  leads  to  the  rejection  cf  the  null 
hypothesis  and  the  conclusion  that  le^el  of  education 
effects  the  overall  error  rate  for  voice  recognition  users. 
A  relatively  strong  positive  correlatioc  exist?  with  a 
critical  level  of  0.006.  That  is,  as  the  individual 
increased  in  level  of  education,  a  ccncoritant  decrease  in 
error  rate  occurred. 

Multiple  comparisons  between  The  various  groups 
showed  the  predominant  influence  to  be  graduate  students, 
further  examination  indicated  possible  confounding  due  tc 
that  group's  prior  experience  with  voice  recognition 
equipment.   Eleven  cut  twelve  graduate  students  were 
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TABU  XX 

AFFECT  01  LEVEL  OF  EDUCATION 


EDUCATION  (ALL) 


EDUCATION  (NAIVI) 


Type  of  Test 


Kruskal-Wallis 


FrusKal-Va His 


Alpha 


.05 


.L'5 


Test 
Statistic 


14.200  ** 


4.  18 


Critical 
Level     ! 


015 


25 


Correlation 
Coefficient  ! 


_  * 


.260  ** 


0e; 


**  =  Significant  at  stated  level  of  significance 


experienced  users.  These  experienced  users  were  stripped 
cut  cf  the  san-pie  and  the  Eruskal-Wal  1  is  test  applied  to 
only  those  that  were  naive  to  voice  technology.  Using  the 
sarre  hypotheses,  the  obtained  test  statistic  does  net  allow 
for  the  rejection  of  the  null.  This,  and  the  recomputed 
correlation  coefficient  ccrrotcrate  the  theory  of 
confounding  and  the  earlier  conclusion  is  now  amended  to 
state  tbat  level  of  education  will  not  affect  recognition 
accuracy,  f^ean  error  rates  for  all  education  levels  are 
shown  graphically  in  Figure  19.  Error  rates  for  both,  total 
sanrpie  population  and  naive  users  only,  are  included. 
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Figure  19.   Mean  Error  Rate  vs.  Education 

8.   Socle -eccncrric  Class 

A  variety  of  socio-econorri  c  classes  were  presented 
to  the  participants  for  selection  with  one  of  the  following 
five  chosen  fy  each  subject: 

a.  Upper  lower  class 

fc.  Lower  rriddle  class 

c.  Middle  class 

d.  Upper  rriddle  class 

e.  Lower  upper  class 

The  analysis  of  total  error  rates  for  these  five  grouns 
(Table  XXI)  yielded  a  test  statistic  that  would  not  allow 
for  the  rejection  of  the  null   hypothesis,   and   it  vaj      be 
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concluded  that  socio-econcrric  class  will  not  affect 
recognition  accuracy,  The  negative  correlation  indicates 
that  individuals  of  a  lower  socio-econorric  class  tend  to 
acquire  higher  error  rates  although  the  coefficient  is  not 
significant  at  the  0.05  level  (critical  level:  Zf.158). 

TAFLI  XXI 
AfFICT  OF  SOCIC-ECCNCMIC  CLASS 


SCCIO-ICONOriC  CLASS 


Type  of   Test 


KrusKal-Wallis 
e.£5 


Alpha 


Test  Statistic     ! 
+. 

Critical  Level     ! 


1 .95 
.83 


Correlation 

Coefficient 


! 


-0.152 


5J5V 


-  Significant  at  stated  level  cf  significance 


9 .   Cental 


Subjects  were  queried  as  to  their  history  cf  dental 
care,  in  particular,  oral  surgery  and/or  orthodontal 
correction.  Two  groups  resulted  upon  whose  lata  a  l^ann- 
Whitney  test  was  performed  to  determine  if  any  difference 
existed  between  them.  The  null  hypothesis  cannot  re 
rejected  due  to  the  computed  test  statistic  (Table  XXII). 
Critical  regions  for  the  test  statistic  included  values 
areeter  than  714.69  and  less  than  635.21. 
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TAPLE  XXII 

AFFECT  OF  PAST  AND/OH  PRESENT  DENTAL  CARE 

— ——————— — — — — — _—___+_——_________  __________  _________ 

!            DEMTAI  CARE 
j. 

Type  of  Test      !  I*ann-Whitney 

1- 

Alpha  J  0.05 

Test  Statistic     |  538 .50 

h 

Critical  Level     !                .3643 
h 

**  =  Significant  at  stated  level  cf  significance 


E.   PHYSIOLOGICAL  CHARACTERISTICS 
1 .   Hypotheses 

The  following  hypotheses  rertaining  to  various 
physiological  characteristics  of  voice  recognition  equipment 
users  were  tested. 

a.   H0 :   The  user's   age   will   net   affect   his/her 
recognition  accuracy. 

H, :   Age  will  effect  The  total   error   rates   cf 
users  of  voice  recognition  equipment. 


b.  K0:  The  height  and  weight  of  an  individual 
using  voice  technology  will  not  affect 
overall  recognition  accuracy. 

H. :   Recognition  accuracy  will  be  affected  by  an 
individual's  weight. 


c.  H0:  The  vital  capacity  and  rate  of  air  flrw  of 
a  user  will  not  affect  his/her  recognition 
accuracy . 
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H  :  Recognition  accuracy  vill  be  affected  by  a 
person's  vital  capacity  and  rate  of  air 
flow. 


d .   H  :   The  overall  physical  condition  of  the   user 


will   not 
accuracy. 


affect   his/her   recognition 


H, :   Recognition  accuracy  will  affected  by  one's 
physical  condition. 


E0 :  Formal  speech  and/cr  voice  trainin.fi  will 
net  affect  recognition  accLracy . 

H,  :  A  user's  recognition  accuracy  will  fce 
affected  by  any  forrral  speech  or  voice 
training/ therapy . 


2 .      Age 

The  subjects  ranged  in  age  from  20  to   47   and   were 

divided   intc  five  groups  for  purposes   of  the  analysis. 

These  grours  and  their  mean  error  rates  are: 


a . 
b. 
c. 
d. 
e . 


20  to  24 
25  to  26 
27  to  31 
32  tc  35 

36  + 


(4.ee%) 

(7.03%) 
(7.15%) 
(5.73%) 
(6.10%) 


These  five  groups  were  tested  to  detect  for  differences 
among  their  rreans.  The  ottained  results  (Table  XXIII)  show 
that  the  null  hypothesis,  stated  above,  cannot  be  rejected 
and  that  the  two  variables,  age  and  total  error  rate,  are 
mutually  independent. 
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TABLE  XXIII 
AFFECT  CN  RECOGNITION  ACCURACY  DUE  TO  AGE 


AGS 


Type  of  Test 


Krusaai-Wailis 

e.05 


A 1 1 h  a 


Test  Statistic     ! 

+ 


2.26 
>  .50 


Critical  level 


Correlation 
Coefficient 


-0.05 


**  =  Significant  at  stated  level  of  significance 


3.   Height  asd  Weight 

Subjects  ranged  in  height  i'rcn-  60  tc  77  inches. 
Four  groups  were  generated  for  analysis  and  are  listed  below 
with  their  respective  rear,  error  rate. 

a.  60  to  64  inches  (5.46*) 

b.  65  tc  69  inches  (6.67%) 

c.  70  to  72  inches  (5.29*) 

d.  73  to  77  inches  (7.14!?) 

The  results  of  the  analysis,  as  surrrarized  in  Table  XXIV, 
indicate  that  the  null  hypothesis  cannot  be  re.iected.  The 
small  positive  correlation  coefficient  is  not  significant  at 
the  .05  level  and  thus  the  variatles  in  question  may  ne 
considered  to  be  independent. 
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Weights  of  the   subjects   ranged   frorr   110  to   240 

pounds.   Examination  for  sorre  natural  '"break'  points  in  this 

range  resulted  in  the  creation  of  the  following  five  groups 
and  their  corresponding  irean  error  rates. 

a.  112  to  125  pounds  (6.48%) 

b.  126  to  145  pounds  (6.65%) 

c.  146  tc  175  pcunds  (5.13%) 

d.  176  to  199  pounds  (7.1£%) 

e.  20e+  pounds  (5.88%) 

The  null  hypothesis  cannot  be  rejected,  with  the  correlation 
coefficient  indicating  independence  between  the  twc 
variables  . 

TABLE  XXIV 
AFFECT  OF  HEIGHT  AND  WEIGHT  ON  RECOGNITION  ACCURACY 


+ 

!        WEIGHT 


m?  TPCTi 


Type   of   Test    i        ErusKal-Walii  s        i        KrusKa  1-We  His 

+ 1- 

Aloha     !         .05         {         .05 


Test 


Statistic  1.98         !       1.95 


Critical  ! 

Level     !      >  .50         !        .75 


Correlation  | 

Coefficient  !        .121        !        .064 


**  =  Significant  at  stated  level  of  significance 
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The  similarity  in  test  statistics  and  correlation 
coefficients  of  height  ana  weight  rray  he  explained  by 
observing  the  correlation  between  height  and  weight  itself. 
A  Pearson  product  moment  correlation  of  .321  suggests  a 
strong  positive  association  between  the  two  variables  and 
thus  serves  to  confirm  the  similar  results  of  the  analysis. 
4.   Vital  Capacity  and  Rate  of  Air  Flow 

The  vital  capacity  of  participating  subjects  ranged 
from  1917  to  5725  cufcic  centimeters.  The  following  four 
groups  were  created: 

a.  1917  to  =£850  cubic  centimeters 

b.  2851  to  5767  cubic  centimeters 

c.  29kc  to  i450  cubic  centimeters 

d.  4658  to  5725  cubic  centimeters 

Analysis  for  differences  between  the  means  of  the  various 
groups  generated  the  test  statistic  (Table  XXV)  that 
resulted  in  the  rejection  cf  the  null  hypothesis.  A 
correlation  between  increased  vital  capacity  and  low  error 
rates  was  found  to  be  significant  using  a  cne-t^il  test  for 
negative  correlation  (critical  level:  .045). 

The  rate  cf  airflow  characteristic  had  a  range  of 
212  to  731  liters  per  minute.  This  range  was  divided  by 
four  and  the  following  groups  were  used  for  the  analysis. 
The  four  included  : 
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a.  212  to  331  liters/rrin 

t.  332  tc  460  liters/min 

c.  461  to  5S9  liters/rrin 

d.  60e  +  liters/rrin 


TABLE  XXV 
AFFECT  CF  VITAL  CAPACITY  ANE  RATE  OF  A  IS  FLOW 


!              !    VITAL  CAPACITY    |   RATE  CF  AI5  5LOW 

!Type  of  Test  |    Xruskal-Wallis    !    Kmskal-Wallis 

I     Alpha             .05         !        .05 

!     Test      !                   ! 

Statistic   !       8.58  **      !       6.38 

Critical    |                   j 
i     Level             .0375        ,'         .095 

!  Correlation  !                   j 

!  Coefficient  j       -.267  **     |       -.318  ** 

**  =  Significant  at  statea  level  of  significance 

The  test  statistic  does  not  allow  fcr  the  rejection  of  the 
null,  tut  a  statistically  significant  correlation 
coefficient  provides  an  indication  that  as  rate  of  air  flew 
increases,  error  rates  will  decrease.  Figures  20  and  21 
depict  mean  error  rates  for  affects  due  tc  vital  capacity- 
end  rate  of  airflow.  Figures  22  and  23  provide  the  scatter 
plots  upon  which  the  correlation  coefficients  were 
determined  . 
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Figure  20.   ^eer.  Errrr  Rate  vs.  Vital  Capacity 
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Figure   22.      Scatter   Plot    fcr  Vital    Capacity 
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Figure  23.   Scatter  Plot  for  Hate  of  Air  Flow 
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The  dilemma  of  a  non-signif  icar  t  Krus&al->iall  is  test 
and  a  significant  correlation  coefficient  can  only  be 
explained  by  the  subjective  division  of  the  range  of  flow 
rates  into  the  groups  used  for  the  analysis.  Siased 
grouping  could  provide  a  matrix  that  vould  yield  a 
significant  test  statistic  to  show  a  difference  cetween 
rreans  but  in  the  final  analysis,  credibility  for  This 
characteristic  as  a  determinant  in  personnel  selection  *ould 
te  lcs t . 

5.   Physical  Condition 

lour  groups  resulted  frcm  the  subjects'  self- 
appraisal  of  Their  general  physical  condition  and  include 
categories  of  fair/poor,  average,  good  and  outstanding 
physical  condition.  Their  tctal  error  rai es  were  examined 
to  determine  if  a  difference  between  the  groups  existed. 
The  results  presented  in  Table  XIVI  do  net  allow  us  tc 
reject  the  null  hypothesis.  Additionally,  a  negligible 
correlation  coefficient  presumes  the  two  variables  to  be 
independent  of  one  another. 

Although  a  subjective  response  was  the  determinant 
for  this  characteristic,  seven  subjects  who  had  enlds, 
trained  the  recognizer.  Their  condition  was  such,  that  a 
distinct  nasality  was  present  while  they  spc£e.  A  r-'ann- 
Whitney  test  was  performed  to  determine  if  a  difference 
between  the  healthy  and  'cold'  groups  existed.  The  test 
statistic   of  Table  XXVI   further   verifies   our  previous 
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conclusion;  ihe  null  cannot  te  rejected.  The  critical 
regions  for  the  ivann-Whitney  test  correspond  to  values 
greater  than  893.6  and  less  than  771.4 

Finally,  the  analysis  for  affect  dup  tc  forrral 
speech  therapy  or  voice;  training  resulted  in  a  test 
statistic  that  vouid  net  allow  for  the  rejection  of  the  null 
hypothesis,  that  speech  therapy  or  voice  training  will  not 
affect  a  user's  recognition  accuracy.  Critical  regions 
corresponded  tc  values  greater  than  £35  and  less  than  €95. 

TABLE  XXVI 
AFFECT  ON  RECfGNITION  ACCURACY  TTJI  TC  PHYSICAL  CONDITION 


PHYSICAL 

i 
i 

SPEECH 

i 

i 
i 

CONDITION 

i 
i 

TRAINING 

i 

i 

COLL 

Tyre  cf 

1 

Kruskal- 

i 

i 

Mar.n- 

*! 

Mann 

Test 

1 
I 

Wellis 

i 

i 

Vhitney 

jWfci tney 

Alpha 

i 

0.05 

1 
1 

.05 

i 

.05 

Test  S  tatistic 

i 

i 

2.57 

i 

i 

761.20 

i 

i 

821.5 

Critical  Lev 

el 

i 

i 

.45 

i 
i 

.46 

i 

i 

ipp 

Correlation 

i 

i 

i 

1 

Coefficient 

! 
i 

3.03 

i 

! 

NA 

i 
i 

NA 

!    **  =  Signif 

leant 

at 

stated  level 

cf  signif 

lcance 
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F.   PSYCHOLOGICAL  CHARACTERISTICS 
1 .   Hypotheses 

a..  E0:      Anxiety  will   net   affect   the   recognition 
accuracy  of  a  user. 

H, :   Anxiety  will  affect  the  total  error  rate  of 
a  user. 


b.   E0 :   The  cccperat iveness  of  a  speaker  will   net 
affect  his/her  total  error  rate. 

H,  :    Speaker   cooperativeness   will   affect 
recognition  accuracy. 


c.   H0 :   The  occurrence  of  recognition  errors  will 
not  affect  overall  recognition  accuracy. 

H(:  A  speaker's  overall  error  rate  will  "be 
affected  by  the  psychological  influence  of 
rris-  ana  ncn-recogni tions  . 


d.  H0:  A  speaker's  beliefs  in  voice  technology  as 
a  tirre  saving  job  aid  will  net  affect 
recognition  accuracy. 

H, :  The  attitude  a  person  possesses  toward  the 
influence  of  voice  en  a  corrputer  operator's 
job  and  their  willingness  to  use  voice 
because  of  this  influence  will  affect 
recognition  accuracy. 


e.  HQ:  The  attitude  a  speaker  has  about  corrputers 
and  information  processing  will  have  no 
psychological  affect  on  recognition 
accuracy . 

H  :  A  speaker's  psychological  attitude 
concerning  automation  and  data  processing 
will  affect  recognition  accuracy. 

Psychological  Anxiety 


The    results    of    the   State-Trait   Anxiety   Inventory   are 
depicted   graphically    in   figures   24    to   26.      Figures   24    and   25 
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show  sore  indication  that  individuals  with  a  lower  state 
anxiety  acquired  fever  errors.  The  relationship  between 
error  rate  and  trait  anxiety,  shown  in  Figure  26,  depicts  a 
more  randomized  cccurrence  of  error  rates.  Correlation 
analysis  substantiates  this  in  that  state  anxiety  during 
week  #1  is  statistically  significant  with  week  #2  showing 
some  positive  correlation  but  net  significant  at  the  .05 
level.  There  is  no  significant  positive  correlation  between 
trait  anxiety  and  error  rates. 

The  obtained  STAI  scores  yielded  a  normal 
distribution  and  equal  sample  sizes  of  high  and  low  anxiety 
users.  With  the  Dasic  assumptions  for  use  of  a  parametric 
test  met,  a  two  sample  t-test  was  used  to  detect  differences 
between  groups.  Additionally,  the  non-parametric  Mann- 
fobitney  test  was  applied  for  purposes  of  further 
verification,  however  it  does  net  possess  the  power  of  its 
parametric  counterpart.  Results  of  the  analysis  ere 
included  in  Tatle  XXVTI. 

In  all  cases  using  non-parametric  analysis  the  null 
hypothesis  cannot  be  rejected,  although  the  critical  level 
shows  the  test  statistic  to  be  just  within  the  acceptance 
region.  The  dichotomy  in  the  trait  anxiety  analysis  is 
interesting;  the  more  powerful  parametric  test  allows  the 
rejection  of  the  null  hypothesis  whereas  the  opposite  exists 
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"Figure   24.      i*e?r   Irror   Rate    vs.    State  Anxiety    (Weeic   #l) 
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Figure  25.   rean  2rrcr  Hate  vs.  State  Anxiety  (Weeic  #2) 
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using  the  Mann-Whitney.  In  both  instances  though,  the  test 
statistic  lies  extremely  close  to  that  point  separating  the 
acceptance  ana  critical  regions. 

The  affect  cue  to  anxiety  rray  be  cons:.der<2a  .is 
inconclusive  tecause  of  the  resultant  statistical  analysis. 
Although  showing  significant  correlation  in  'tfeek  #1,  any 
anxiety  in  Weefc  #2  may  have  teen  overcome  or  masked  ay 
familiarity  ana  experience  with  equipment  anc  procedures. 
3y  tfeeic  #3  ana  the  administration  cf  the  Trait  inventory, 
subjects  were  thoroughly  versed  in  the  experimental 
procedure.  The  inconsistent  results  nevertheless,  leave 
reason  to  believe  that  anxiety  has  an  affect  on  speech  and 
hence  recognition  accuracy,  but  the  decree  to  which  it  dees 
remains  a  clcuaed  issue. 
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3  .   Speaker  Cocpera t iveness 

Subjects  evaluated  their  degree  of  cooperativeness 
on  an  interval  scale  with  subsequert  creation  of  the 
fo 1  low! ng   groups . 

a.  Less  than  cooperative  speakers 

b.  Moderately  cooperative  speakers 

c.  Verjf  cooperative  speakers 

d.  Extremely   cooperative   speakers   (subjects   who 
marked  the  'anchor  point'  of  the  scale) 

The  results  cf  the  analysis  are  presented  in  Table  XXVIII. 
with  rrean  error  rates  graphically  represented  in  Figure  27. 
The  null  hypothesis  Is  rejected  due  to  a  test  statistic 
greater  than  1  he  Chi-square  value  of  7.815.  Multiple 
comparisons  am erg  the  groups  reflect  ar  existent  difference 
between  the  'less  than  cooperative'  and  'extremely 
cooperative'  speakers  only.  Despite  indication  of  some 
correlation  between  high  cooperativeness  and  low  error  rate, 
the  computed  coefficient  is  not  significant  at  a  .?*  level 
(Critical  Level:  0.095)  . 

These  results  led  to  a  further  analysis  from  a 
perspective  of  speaker  participation.  That  is,  did  the 
subject  like  participating  in  this  type  of  experimentation 
and  if  so,  could  it  be  correlated  to  total  error  rate? 
Their  subjective  responses  resulted  in  the  creation  of  three 
generic  groups  as  fellows: 
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Figure  27.   Pean  Error  Rate  vs.  Speaker  Coopera tiveness 


TABLE  XXVIII 
AFEECT  CE  SPEAKER  COOPERATION  AND  PARTICIPATION 


!   COCPERATIVENESS    !     PARTICIPATION 

Type  of  Test  !    Krusical-Walii  s    j    Krusfcal-Wallis 

Alpha      !        .05         !        .05 

Test     j                   ! 
Statistic   |      16.82  **               !       4.76 

Critical    '                   ! 

Level     !      <  .005        !        .095 

Correlation   !                   ! 

Coefficient   !       -.226        !       +.278  ** 

**  =  Significant  at  stated  level  of  significance 
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a.  These  who  don't  care 

b.  Persons  who  like  to  participate 

c.  Persons  who  strongly  like  to  participate 

In  this  instance  the  attainment  of  a  positive  correlation 
indicating  that  those  who  liked  to  participate  acquire 
higher  error  rates  is  counter-intuitive.  The  null  cannot  be 
rejected  based  on  the  computed  test  statistic  given  in  Table 
XXVIII.  A  correlation  of  .636  between  subject  responses  to 
corperativeness  and  participation  is  net  as  large  as  was 
expected  and  as  such  could,  in  part,  have  led  to  the 
divergent  results.  Whether  these  results  are  di:e  tc  willing 
participants  trying  too  hard  to  perform  well  and  thus, 
having  greater  than  usual  mis-  cr  non-recognitions  is 
unclear. 

4.   Recognition  Errors 

Subjects  responded  tc  two  Questions,  one  pertaining 
to  their  feelings  at  the  time  of  a  mis-recognition  and  the 
other  pertaining  tc  their  feelings  ever  a  con-recognition 
(beep).  Their  responses  to  these  two  questions  were 
averaged  tc  represent  how  they  felt  toward  the  occurrence  of 
an  error  and  this  led  to  the  creatior  of  two  distinct 
groups;  those  who  don't  like  an  error  to  cccur  and  those  vho 
feel  they  are  not  disturbed  cr  tothered  by  an  error.  The 
results  of  the  analysis  are  summarized  in  Table  XXIX. 
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TABLE  XXIX 
AFFECT  OF  RECOGNITION  ERRORS 


i 

i 

EBEORS 

Type  of  Test 

i 

Ma 

nn-Wbitney 

Alpha 

1 

1 

0.05 

Test  Statistic 

1 

I 

612.50 

Critical  Level 

1 

1 

.0897 

Correlation 
Coefficient 

i 
i 
i 
i 

-0.225 

**  =  Signifies 

nt 

at 

sta 

tea 

le\ 

el  of  significance 

The  null  hypothesis  cannot   te   rejected   ana   although   the 
negative   correlation   coefficient   indicates  that  tuose  who 
dislike  errors  tend  to  have  higher  error  rates,   it   is   net 
significant  at  an  alpha  of  .05  (Critical  Level:  .07). 
5.   Attitudes  Toward  the  Use  cf  Voice 

Questions  4,  6  ana  6  of  User  Cuest ionne 3  re  #2  were 
used  to  measure  the  speaker's  attitudes  toward  voice 
technology.  The  results  (Table  XXX)  indicate  a 
statistically  significant  correlation  between  high  error 
rates  and  a  favorable  attitude  toward  voice  recognition  as  a 
rreans  of  saving  tiire  and  reducing  the  burden  on  a  computer 
operator.  Scatter  plots  of  responses  tc  these  questions  and 
associated  error  rates  are  depicted  in  Figures  28-20. 
Multiple  comparisons  between  the  groups  shewed  differences 
between  those  who  would  always  use  voice  and  those  who  world 
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figure  2S .   Scatter  PLot:   Mean  Error  Race  vs.  Question  #4 
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Figure  29.   Scatter  Plot:   Mean  Error  Hate  vs.  Question  #6 
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Figure  2£.   Scatter  Plot:   fean  Error  Rate  vs.  Question  "8 


seldom  use  it  despite  its  pronounced  advantages,  and  between 
those  who  felt  that  the  advantages  of  voice  will  give  the 
Keyboard  operator  other  jobs  and  these  who  disagree  with 
such  an  attitude.  Therefore,  the  null  hypothesis  cannot  be 
rejected  is  terms  cf  a  speaker's  attitude  concerning  the 
influence  on  a  data  processor's  job  due  to  voice 
recognition.  Cn  the  other  nana,  a  speaker's  willingness  to 
use  voice  recognition  oecause  of  his/her  belief?  in  its 
requisite  advantages  will  affect  error  rates. 

As  was  noted  earlier,  the  presence  of  a  positive 
correlation  appears  to  be  contrary  to  popular  belief.  Cne 
would  imagine  that  a  user  who  believes  voice  recognition  can 
make  the  job   of  a  computer  operator  easier  (Question  #4), 
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would  tend  toward  better  recognition  accuracy.  Questions 
six  and  eifht  were  asked  for  the  purpose  of  determining  if 
a  user's  error  rate  rright  be  influenced  by  the  subconscious 
thought  of  encumbering  additional  duties  because  of  the 
efficiency  and  effectiveness  of  voice  input.  But,  despite 
the  possibility  of  additional  tasks,  potential  users  still 
would  prefer  voice  to  manual  entry.  However,  the  presence 
of  a  significant  positive  correlation  may  only  he  attributed 
to  the  uniqueness  of  the  situation?  ie.  as  in  speaker 
participation  subjects  who  professed  a  strong  desire  to  use 
voice  regardless  of  consequences  may  have  tried  too  hard  for 
high  accuracy  and  as  a  result  have  failed  to  speak  in  a 
'natural '  manner . 

6.  Attitude  Toward  Computers  and  Information  Processing 
In  response  to  two  sets  cf  questions,  subjects 
provided  their  attitudes  surrounding  the  necessity  of 
computers  in  todays  society  ana  how  voice  technology  would 
aid  information  processing  or  data  input.  Attitudes  towards 
computers  fell  into  three  general  categories. 

a.  Persons  who  feel  computers  are  unnecessary. 

b.  Persons  that  feel   computers   are   necessary   in 
society,  but  are  not  a  panacea  for  all  problems. 

c.  Those  who  feel  that  computers   are   an   absolute 
necess  ity . 

Attitudes  toward  voice  recognition  and  information 
processing  resulted,  in  four  categories. 
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a.  These  believing  that  voice  would  taKe  more   time 
for  information  or  aata  processing. 

b.  Those  with  no  opinion. 

c.  Those  who  feel  voice  will  save  sorre  tire 

a.  Those  who  feel  voice  can  save  iirireasuraole  tiire 
ccrrpared  to  conventional  methods  of  data  entry 
and  information  processing. 
Results  of  the  analysis  are  summarized  in  Tacle  XIII.  Eased 
on  these  results,  the  null  hypothesis  cannot  ce  rejected  and 
thus,  it  rray  fce  concluded  that  the  opinion  or  attitude  a 
person  possesses  towards  computers,  and  their  feelings 
pertaining  to  voice  as  a  time  saving  advantage  will  not 
affect  their  recognition  accuracy. 

TABLE  XXXI 

AFFECT  DUE  TO  ATTITUDES  TCWAPD  COMPUTERS 
AND  DATA  PROCESSING 


+ + 4. 

COMPUTERS      !    DATA  PROCESSING 


I  I  POMDnTTTJC 


Type  of  Test  \        Krusxa 1-Walli s       Kruska i-Wa 11  is 


Alpha     !         .05  !         .05 

4 (. 

•Lest      i  i 

Statistic   |         .78  !        3.38 

4 y. 

Critical    j  ! 

Level     !      >  .8  !        .15 

Correlation  |  ! 

Coefficient  |         .111        !       -.164 

**  =  Significant  at  stated  level  of  significance 


13? 


G.   VOCABULARY  ERRORS 

As  a  result  of  using  different  numbers  of  syllables  in 
the  vocabulary,  it  vas  also  possible  to  get  an  indication  of 
bow  well  utterances  with  different  nurrbers  of  syllables  were 
recognized.  Originally  done  in  a  Longitudinal  study  [Ref. 
£4:  pp.  9-10]  it  is  analyzed  within  the  context  of  this 
docurrent  as  further  verification  of  these  earlier  results. 
This  is  shewn  by  weeks  in  Figure  31  and  ever  all  conditions 
in  Figure  32.  Both  figures  illustrate  a  generally  declining 
error  rate  as  a  function  of  the  number  cf  syllables  in  the 
utterance.  Although  the  current  experimentation  yielded  an 
approximately  1.5  percent  rise  in  error  rate  :'rcm  three  tc 
four  syllables,  it  is  not  a  large  deviation  from  the  earlier 
study  which  indicated  little  change  in  errcr  rates  between 
three  or  four  syllables  words. 

In  terrs  of  overall  effectiveness,  a  practical 
application  would  dictate  the  least  amount  cf  recognition 
errors.  Therefore,  an  error  rate  of  5.91%  still  remains  two 
to  three  percent  better  than  utterances  with  a  smaller 
syllabic  content.  Despite  the  higher  rate  for  four  syllable 
con-pared  tc  five  syllable  words,  the  difference  is  still 
less  than  that  of  one  to  four  or  two  to  four  syllables.  The 
variety  cf  vocabulary  items  used  in  this  experiment  further 
confirms  the  argument  that  through  a  careful  and  judicious 
selection  cf  vocabulary  items,  large  vocabulary  difficulties 
and  associated  high  error  rates  rrey  be  reduced. 
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VI.   CONCLUSIONS 

Following  the  lengthy  elaboration  of  results  in  the 
previous  section  it  would  be  helpful  to  recapitulate,  in  a 
brief  sutrrrary  forrr,  the  responses  of  the  different  variables 
tested.  Variables  resulting  in  a  statistically  significant 
test  statistic  included: 

--  Method  of  training 

—  Experience  of  the  user 

—  Previous  computer  experience 

--  Level  cf  education  (all  subjects) 

—  Vital  capacity 

—  Speaker  cccperativeness 

The  following  variables  produced  a  significant 
correlation  between  itself  anc  recognition  error  rate. 

—  Previous  computer  experience 

—  Tirre  of  the  week 

—  Experience  of  the  user 

—  Level  of  education  (all  subjects) 

—  Speaker  participation 

—  Vital  capacity 

—  Hate  of  air  flow 

—  State  anxiety  (first  week  only) 

—  User  attitudes  pertaining  to  voice 
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The   following  variables   resulted   in   either  a   non^ 
significant  test  statistic  and/cr  correlation  coefficient. 

—  Jot  function 

—  Branch  of  service 

—  Job  satisfaction 

—  Service  satisfaction 

—  Foreign  language  competency 

—  Time  of  day 

—  Time  of  week  (test  statistic  only) 

—  Ease  of  use  of  voice  equipment 

—  Level  of  education  (naive  users) 
--  Socio-economic  class 

—  rental  care 

—  Race 

--  Marital  status  and  family  size 

—  Religious  preference 

—  Accent 

—  Place  c::"  birtn/geceraphic  origin 
--  Age 

—  Height  and  weight 

—  Rate  of  airflow  (test  statistic) 

--  Physical  conditioning/speech  training 

—  Anxiety:  State  and  Trait 

—  Speaker  ccoperat iveness  (correlation) 

—  Speaker  participation  (test  statistic) 


142 


—  Affect  of  recognition  errors 

—  Attitudes  toward  computers/data  processing 

The  vide  range  in  error  rates,  .50  tc  15.7  percent,  for 
the  individual  subjects  (See  Appendix  J  for  a  complete 
summary)  indicates  en  obvious  variability  between  subjects. 
Within  the  context  of  the  train  experiment  aDd  t'ne  associated 
ANOVA,  the  three  variables  of  joe  function,  training  method, 
and  experience  (trials),  are  independent  events  and  are 
protected  from  confounding  due  to  the  exper  irrental  design. 
The  selection  of  a  level  of  significance  equal  to  .05  is 
merely  to  shew  a  possible  existence  of  some  effect,  not  to 
demonstrate  a  rigorous  test  of  a  stated  hypothesis.  As  the 
analysis  progresses  tc  the  extraction  of  numerous  ether 
human  factors,  these  protections  and  thp  accompanying  power 
of  a  parametric  test  are  reduced.  In  some  instances  an 
awareness  of  a  possible  dependence  between  conditions  is 
necessary  prior  to  reaching  an  ultimate  conclusion.  For 
example,  were  those  subsets  of  a  category  achieving 
statistical  significance  also  trained  with  supervision 
and/cr  experienced  users  and  if  so,  hov  many  were  in  that 
particular  su&set? 

The  results  presented  herein  suggest  that  speaker 
variability  would  not  affect  recognition  accuracy  to  such  an 
extent  as  to  preclude  its  use  among  only  specially  selected 
users.  Jcr  implerrenta t ion  in  military  applications,  this 
proves  to  be  especially  satisfying  since  it  would  negate  the 
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services  from  the  necessity  of  classifying  personnel  intc 
particular  rrilitary  occupational  specialties  or 
subspecialties  for  the  express  purpose  of  operating  voice 
equipment.  It  is  apparent  from  the  experimentation,  and  the 
diversity  of  skills  ana  experience  contained  within  the 
sample  population,  that  practically  anyone  may  be  a 
potential  candidate  to  operate  voice  recognition  equipment. 

The  phrase  'practically  anyone'  should  be  qualified 
here.  InterspeaKer  variability  haa  a  significant  impact  in 
the  case  of  one  subject,  who  possessed  a  severe  speech 
impairment?  stuttering.  It  became  obvious  in  the  early 
stages  of  training  that  be  would  be  unable  to  finish  the 
training  phase.  In  fact,  after  3C  minutes,  only  11 
utterances  had  been  satisfactorily  placed  into  memory. 
Although  the  individual  was  eliminated  as  an  experimental 
subject,  his  difficulty  aerronstrates  that  although  most 
anyone  can  use  this  type  of  technology,  there  will  always 
exist  those,  albeit  few  in  number,  who  for  one  exception  or 
another  are  unable  to  attain  a  suitable  le^el  of  recognition 
accuracy . 

The  current  experimentation  has  clearly  shewn  that, 
experience  ana  rethoa  of  training  voice  equipment  can 
provide  excellent  recognition  accuracy  rates.  Of  course, 
what  determines  an  'excellent'  rate  is  purely  subjective  ana 
determinate  upon  the  application  in  which  errplaced.  What 
makes   this   observation   readily   appealing   is   that   both 
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characteristics  are  controiled  by  the  human.  They  are  not 
factors  that  one  is  born  with  cr  has  inherited.  Rather, 
with  closely  supervised  training  procedures,  by  an 
experienced  operator,  a  'naive'  user  car  quickly  attain 
recognition  rates  greater  then  95  percent  and  with 
repetitive  experience  increase  this  accuracy  until  errors 
are  reduced  to  less  than  two  percent.  It  rrust  be  reiterated 
that  in  the  present  experiment,  subjects  were  not  allrwed  to 
retrain  the  recognizer  during  the  three  weeks  of  recognition 
testing.  In  actuality,  the  speaker  would  retrain  an 
utterance  rather  than  to  continue  incurring  mis-  or  non- 
recognition  errors. 

To  a  lesser  degree,  speaker  ec cperativeress  and  amount 
of  previous  computer  experience  are  definitely  factor?  to  be 
considered.  The  latter  characteristic  influences  the 
personnel  selection  process  while  speaker  c ocpera t iveress, 
like  training  ana  experience,  can  te  influenced  by  The-  human 
element.  Certainly,  recause  of  data  processing  experience, 
such  individuals  can  readily  identify  with  the  advantages  of 
speech  input  a^.d  thereby  become  a  mere  or  highly  cooperative 
speaker.  Thus  combined,  these  two  factors  strongly  support 
the  potential  for  achieving  high  recognition  accuracy. 

The  presence  of  occasional  positive  correlation 
coefficients,  that  were  statistically  significant,  are 
difficult  to  explain  or  resolve  conclusively.  Such 
instances   as   level   of  participation,  desire  to  use  voice, 


145 


and  attitudes  pertaining  tr  voice,  provided  misleading 
results.  It  was  surmised  that  speakers  who  are  willing 
participants  and  find  voice  to  be  a  technology  that  they 
would  likely  use,  would  achieve  low  error  rates.  The 
observation  to  the  contrary,  supposes  that  rany  of  those 
speakers  tried  too  hard  for  perfect  recognition  accuracy, 
and  as  a  result,  were  less  art  tc  speak  naturally.  In 
effect,  they  were  trying  to  outsmart  the  irachine. 

Thus,  in  ac  operational  environment  it  becomes  incumbent 
upon  both  the  speaker  and  the  supervisor  to  fully  embrace 
the  concept  of  voice  technology  for  use  in  a  practical 
application.  In  demonstrations  at  the  N'aval  Postgraduate 
School  it  is  frequently  noted  that  observers  are  genuinely 
impressed  with  the  capabilities  of  vcice  input  of  data  until 
that  one  error,  sometimes  after  rore  than  200  successfully 
recognized  uttterances,  occurs  and  ";hey  sit  back  and  remark 
that  perhaps  "additional  research  is  needed  ^ricr  tc  placing 
it  into  operational  use".  It  :.  s  obvious  that  voice 
technology  is  acceptable  for  use  in  a  rrilitary  command 
center  and  must  he  fully  supported  by  the  Commander  and  his 
Staff.  If  it  is,  error  rates  can  be  minimized  by  human 
controls  such  as  training  and  experience.  In  conclusion, 
consistency  tray  best  describe  the  key  to  speaker 
variability.  Attitudes,  training,  and  experience  together, 
produce  consistency  in  speech  and  consistency  generates  a 
continued  high  recognition  accuracy  rate. 
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APPINDIX  A 
USEP  QUESTIONNAIRE  #1 


NAM:  SUBJECT?: 


INSTRUCTIONS: 

The  purpose  of  this  questionaire  is  to  obtain  information 
frcir  you  regarding  physical  characteristics,  personal 
background,  and  opinions  pertaining  to  voice  recognition 
equipment  and  its  use.  Your  answers  will  assist  in 
aeterrining  whether  personal  and/or  physiological  traits 
contribute  to  effective  utilizati.cn  cf  voice  recognition 
equipment . 

The  questions  include  multiple  choice,  YES/NC,  ratine  scale 
and  short  answer  'one  or  two  words  ONLY!)  types. 
Appropriate  guidance  accompanies  each  question  or  block  of 
ques  ti  ons . 

Your  name  is  NOT  required  but  is  requested  in  order  to  ease 
the  necessary  correlation  of  your  replies  with  your  results 
in  the  experimentation.  If  you  desire  anonymity,  please 
respond  with  your  subject  number  only.  Please  respond 
truthfully.  Check  your  questionaire  after  completion  to 
insure  you've  corrpieted  all  the  questions. 

Thanir-you  for  your  assistance  in  this  experiment. 

14? 


In  questions  1  -  22,   provide  either  a   one   or   two 
response,  or  place  an  X'  by  the  appropriate  answer. 


word 


1.   What  is  your  age? 


2.   What  is  your  height  (in  inches)? 


3.   What  is  your  weight? 


What  is  your  race? 

White  (Caucasian) 
Yeiiow  (Asian/Mongoloid ) 
Biacir  (Negroid/African) 
Bed  (American  Indian) 

what  is  your  nationality? 

Native  Citizen  of  the  United  States 
Naturalized  Citizen  of  the  United  States 
Alien 


What  is  your  religious  preference? 
(See  Attached  Sheet) 


?.   What  is  your  ethnic  background? 

Puerto  Rican 

7ilipino 

Mexican 

Cuban 

Latin  American  (persons  frcrr  Central  or  S.  America) 

Other  Hispanic  Eescent  (Extraction  not  delineated 
as  Mexican,  Puerto  Rican,  Cuban  or  Latin  American) 
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Aleut 

Indian 

Peianesian 

Chinese 

Japanese 

Korean 

Polynesian 

Vietnamese 

Cther  Asian  Descent  (Extraction  not  delineated  as 
Chinese,  Japanese,  Korean,  Indian,  Jilipino,  or 
Vietnamese ) 

None  of  the  Aocve 

Cther  (Please  specify ) 

8.   Do  ycu  have  an  accent? 

YIS  (what  Xina?  ) 

NO 

y.   What  is  your  Marital  Status 
Parried 
Divorced 
Single 
Cther  (separated,  widowed) 

10.   How  rrany  children  do  you  have? 
0 
1 
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>4 

11.  Do  yen  wear  glasses? 

YES 
NC 

12.  Have  you  ever   had   orthodontist   care  S«/or   wear/worn 
Draces? 

YES 

NO 

13.  What  is  your  level  of  education? 

Ncn  Hi.^h  School  Graduate 
High  School  Graduate 
Associate  Degree 

1  year  cf  college 

2  years  of  college 

3  years  of  college 

4  years  of  college  (no  degree) 
College  graduate  (EA/ES) 

Graduate  work:  cf  more  than  1  year  (no  degree) 
Masters  Degree  received 
Doctorate  Degree  received 

14.  What  state  were  you  born  in? 

15.  During  ages  1-18,  in  what   state   did   you   principally 
res  ide? 
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16.  What  has  teen  your  state  of  residence  for  the  majority 
of  the  last  three  years? 

17.  Do  yen  speaK  any  foreign  language(s)? 

TES  [which  one(s)  ] 

NO 

18.  What  is  your  branch  of  service? 

Navy 

Army 

Marine  Corps 

Air  Force 

Cther  (civilian) 

19.  How   irany   years   have   you    been    ir    the   service? 

20.  Eave      ycu      ever      teen      overseas      for      rrcre        than        13 
consecutive   rronths?      (not    including    leave    or    vacaxjon'8 

YSS  (go  tc  question  #21) 

NO  (go  to  question  #22) 

21.  How  p-any  ironths  were  you  overseas? 
In  wnat  country? 

22.  What  do  ycu  consider  to  te  your  socioeconomic  class? 

lower  Class 
Upper  Lower  Class 
lower  Piddle  Class 
Piddle  Class 
Upper  Piddle  Class 
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Lower    "  "i8r    '"'  i  a  c  s 
Ui>ner  Class 


In  questions  23  -  4  e  place  an  '  a  '  en  a 

hat  test  indicates  or  descrices  y r 
e  olacei  anyvr.ere  a  lor.,?  tne  scale. 


on   the   scale 
that  test  indicates  or  describes  your  feelings.   The  '7'  ^ay 


How  do  ycu  feel  acont  tbe  jot  or  position  you  currently 


te.  vev 


LIrZ  7Z?Y 

r  u  c  h 


HIE 


NEUTSAI 


nsnrz 


riSLIIE 
VZ^Y  POCH 


Bon  mocb  sat  is  fact : :r  dc  you  derive  from  being  a 

the  &TVE&    Services? 


i  p  ry     *  o  -^ 


:A::i} 


SA 


t  T  C  T  T  T  I" 


"BC^I"?  ~  "  ' I 


UNSATI SZ  !ZZ 


Z  —  o'jte""c    are    n e c e c  s a r '•r    in    today's    society 


ricinriT 
:-zz 


SIIGHTIT 


NC    OPINION 

ION':    iNC'v 


c  _  TrinipT  y 

z  I S  A  G  %  1 1 


dicidedit 

DISAGREE 


c  t? 


Hot*   v  ou  id   7  c  i  c  e   r 


gn i t i c n   v a k e   a   corputer 


n  e  r  a 


KOCB 

za_:zz 


SCffEWHA' 
K1SIEB 


MC    DPI N I CM 


i .  u  o  □    t  v>  rc  Jl 


LfixJlOULi  lilxflv/UJLI 
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27.    How   would   voice   recognition 
infcrmaticE  processing  or  data  input? 


equipment 


affect 


i 


SAVE  A  LO' 
CI  TIME 


SAVE  SOME 
TIME 


NO  OPINION 
DON'T  KNCfc 


TAKES  MORE 
TIME 


TAKES  A  ICT 
MORE  TIME 


k.8 .   If  voice  recognition  can  save  tine,  it   would  allow  a 
xteycoard  operator  n  dc  other  joDs. 


BSCIDEDLY 
AGREE 


SLIGHTLY 
AGREE 


NO  OPINION 
EON'T  KNCte 


SIIGHTIY 

LIS  AGREE 


EECIEEEIY 
DISAGREE 


29.   Descrite  the  use  of  voice  recognition  equiprent. 


VERY  EASY 
TO  USE 


ASY  TC 
USE 


NO  OPINION 


DIFEIGULT 
TC  USE 


VERY 

DIFEICUI' 
TO  USE 


3c.      i'fcat    ao  /cu    tnmk   of    voice      recognition      equiprent      for 
use    in   Military    Corrnrana    Centers? 


PRACTICAL 


SOY  EWE  AT 
PRACTICAL 


NC  OPINION 
EON  'T  KNC1* 


SOMEWHAT 
IMPRACTICAL 


VERY 
IMPRACTICAL 


1.   Hoi*  rruch  previous  computer  experience  have  you  bad? 


ALOT  CE     CCNSIDERA'iii: 
EXPERIENCE   EXPERIENCE 


SOME 

EXPERIENCE 


VERY  LITTLE        NO 
EXPERIENCE    EXPERIENCE 


is; 


32.   What  is  ycur  previous  experience  with  voice  recognition 

eqi  iprrent? 


vi:ry  r^ucH 


rucH 


SOME 


A  IITTLE 


NONE 


23.   how  would  additional  experience  with  voice   recognition 
e'-.uipn.ent  affect  recognition  accuracy? 


MUCH 
IMPROVEMENT 


5GMI 

IfFBOVEMENT 


NO  OPINION 


A  LITTLE 
IMPROVEMENT 


NO 
IMPROVEMENT 


24.   How  do  you  feel  when  a  misrecognition  occurs? 


S'JRCNGLY 
LtKE 


LIKE 


NEUTRAL 


LISLIKE 


STRONGLY 

DISLIKE 


2£.   Low  do  you  feel  when  a  non-recognition  ('beep')  occurs? 


STRONGLY 

LIKE 


LIKE 


NEUTRAL 


DISLIKE 


STRONGLY 
EI  SI  IKE 


26.   How  ao  you  feel  when  a  recognition  occurs? 


STRONGLY 

LIKE 


LIKE 


NEUTRAL 


islik: 


STRONGLY 

DISLIKE 
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'67.      Describe  your  par  ticipe  tion  in  this  experiment. 


EXTREMELY     MODERATELY 
iCCPERiillVZ   COOPERATIVE 


COOPERATIVE      SOMEWHAT       VERY 
UNCOOPERATIVE      UNCCOF- 

ERATIV] 


26.   Ecw  would  you  describe  your  participating  in  mis   type 
of  experimentation? 


STRONGLY 
LIKE 


I  IKE 


NEUTRAL 


DISLIKE 


STRONGLY 
DISLIKE 


29.      tfnat    is   your   current   physical    condition? 


CUTSTANLING 


GCCD 


AVERAGE 


EAIR 


PC  OR 


£0 .   If  voice  recognition  dees  save  tirre  and  allows   YOU 


tc 


be  assigned  other  tasks,  now  often  would  YCU  want  to  use  it? 


ALWAYS 


FREQUENTLY 


NOW  AND  THEN 


SELDOM 


NEVER 


1  i-t. 

1  *_  — 


APPINLIX  £ 
USER  QUESTIONNAIRE  #2 


NAME: 


SUBJECT. 


INSTSUCTICNS : 

Toe  purpose  of  this  questionaire  is  xc  obtain  ic format ion 
from  you  regarding  physical  characteristics,  personal 
cackground,  and  opinions  pertaining  to  voice  recognition 
equipment  and  its  i.se.  Your  answers  will  assist  ir 
determining  whether  personal  and/cr  physiological  traits 
contribute  tc  effective  utilization  of  voice  recognition 
equipment. 

The  questions  include  multiple  choice,  TSS/NO,  rating  scale 
ana  snort  er.swer  'one  or  two  words  ONLY!)  types. 
Appropriate  guidance  accorpanies  each  question  or  clock  ol 
questions  . 

Tour  name  is  NCT  required  out  is  requested  in  order  to  ease 
the  necessary  correlation  cf  your  replies  with  your  results 
in  the  experimentation.  If  you  desire  anonymity,  please 
respond  with  ycur  subject  number  only.  Please  respcnd 
truthfully.  ChecK  ycur  questionaire  after  completion  tc 
insure  you've  completed  all  the  questions. 

Tran£-you  :or  your  assistance  in  this  experiment. 
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In  questions  1-3,  provide  eitner  a  one  or  two  word 
response,  cr  place  an  'a'  by  the  appropriate  answer. 

1.  Eave  you  ever  had  one  or  rrore  of  me  following  speech 
impediments  aua./or  impairments? 

Articulation  (difficulty  ir  prcncuncing  vcwels 
ana/or  consonants) 

Voice  i irregularities  in  the  larynx) 

Cleft  lip  and/or  lip  palate 

Ceretral  palsy 

Stuttering 

HeariDg    irrpairrrents 

Aphasia 

Congenital  speecn  defects  (due  to  birth/pregnancy) 

Retardation 

None  of  the  above 

2.  Have  you  ever  received  speech  therapy  from  either  a 
subsidized  (free)  clinic,  private  speech  therapist,  cr 
tnrough  tne  public  school  system? 

YZS 

NC 

6.  have  you  ever  received  voice  training  or  taKen  singing 
lessons? 

YES  (Hew  many  years? ) 

NC 


1£7 


In  questions  4-15  place  an  'X'  on  a   point   en 
that  best  indicates  or  describes  your  feelings, 
be  placed  anywbere  along  the  scale. 


the   scale 
Tne  'X'  rrey 


4.   how  would  voice  recognition  neKe  a  computer   operator's 
jot? 


ivUCE 

EASIER 


SOMEWHAT 

EASIER 


NO  OPINION 


MORE 
DIFFICUIT 


MUCH  MORE 
DIFFICULT 


£ .   How   won  11    voice   recognition   equipment   affect 
iiiforn  ation  processing  or  lata  input? 


SAVE  A  LOT 
OF  TIME 


SAVE  SOMS 

TIME 


NO  OPINION 

EON  'T  KNOW 


TAKES  MORE 
TIPE 


TAKES  A  LOT 
MORE  TIME 


6.   If  voice  recognition  can  save  tiire, 
Keyboard  cperatcr  to  do  o'iher  jobs. 


it   would  allow 


DECIDEDLY 
AGEEE 


SIIGHTH 

AGREE 


NO  OPINION 
DON'T  KNCh 


SLIGHTLY 
DISAGREE 


DECIDEDLY 
DISAGREE 


7.   Describe  the  ise  of  voice  recognition  equipment. 


VERY  EASY 
TO  USE 


:asy  TO 
USE 


NO  OPINION 


DIFFICULT 
TO  USE 


VERY 
DIFFICULT 
TO  USE 
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8.   If  voice  recognition  does  save  tirre  and  allows   YOU   to 
oe  assigned  other  tasKS,  how  often  would  YCU  want  tc  use  it? 


ALWAYS 


FBIQUENTLY 


NCW  AND  THEN 


SELDOM 


NEVER 


i>.   How  would  aaditional  experience  with  voice   recognition 
equipment  affect  recognition  accuracy? 


MUCH 

lMrRCVEfENT 


sen 


NO  OPINION 


T  V  "D 


fFBCVEi^'ENT 


A    LITTLE 
IMPBOVEMENT 


NT 


IMFBCVEMENT 


10.   How  do  you  feel  when  a  mi sreccgni t ion  occurs? 


STRCNGLY 

LI  4 1 


IIKE 


NEUTRAL 


DISLIKE 


STRONGLY 
DISLIKE 


ii.   How  do  you  feel  when  a  non-recognition  ('beep')  occurs? 


STRONGLY 
LIO 


IUI 


NEUTRAL 


DISLIKE 


STRONGLY 

DISLIKE 


I5y 


12.   How  ao  you  feel  when  a  recognition  occurs? 


STRONGLY 
LIKE 


I  IKE 


NEUTRAL 


DISLIKE 


STEGNGLY 

DISLIKE 


13.   Descrioe  your  participation  in  this  experiment. 


EXTREMELY     MODERATELY    COOPERATIVE      SOMEWHAT       VERY 
CGCPERATIVI   CCCfERATIVI  UNCOOPERATIVE       UNCOOP- 

ERATIVE 


14.   How  wouia  you  describe  your  part icipatirg  in  this   type 
of  Ext  erirrentat  ion  V 


STRONGLY 
LIKE 


I  IKE 


NEUTRAL 


DISLIKE 


STRONGLY 
DISLIKE 


lc.   What  ao  yen  tuinK  of  voice   recognition   equipment   for 
use  in  Military  Coirnana  Centers? 


VERY         SOMEWHAT 
PRACTICAL    PRACTICAL 


NO  OPINION      SOMEWHAT        VERY 
DON'T  KNOW    IMPRACTICAL   IMPRACTICAL 
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APPENEIX  C 
SELi-EVALUATION  QUESTIONNAIRE 


NAME EiiTE SUBJECT# 

EIBECTIONS:  A  rumoer  of  statements  which  people  nave  used 
to  describe  theirseives  are  given  below.  Read  each  statement 
and  then  circle  the  appropriate  numcer  to  the  rignt  of  the 
statement  that  indicates  how  yGU  GENERALLY  feel.  There  are 
nc  right  or  wrong  answers.  Please  dc  not  spend  too  much 
time  on  any  ore  statement,  but  give  the  answer  which  seems 
to  aescrite  how  you  GENERALLY  feel. 

1  =  ALMOST  NEVER 

2  =  SOMETIMES 

3  =  OITIN 

4  =  ALMOST  ALWAYS 

1.  I    reel   pleasant  12  2  4 

2.  I    tire   quickly  12  2  4 
2.      I    feel    like   crying                                         12            2  4 

4.  I    wish    I    could    be   as    happy    as  12  2  4 
ethers    seem   tc    ce 

5.  I    an-    iosirg    out    on    things    Decause        12  2  4 
I    can't    fTake    up   my   mind    soon 

enough 


161 


6.  I  reel  rested  1 

7.  I  am  "cairn,  cool,  and  ccilected"    1 

fa.   I  feel  that  difficulties  are        1 
piling  up  sc  that  I  cannct 
over  core  them 

9.   I  fcorry  too  rruch  over  sonetning     1 
that  really  doesn't  matter 

10 .  I  arr  happy  1 

11.  I  aT  inclined  to  taxe  things  hard  1 

12.  I  lack  self  confidence  1 
I'd.  I  reel  secure  l 

14.  I  try  to  avoia  racing  a  crisis      1 
cr  difficulty 

15.  Ifeelolue  1 

16. I  am  content  1 

17.   Sorre  unimportant  thought  runs       1 
through  rry  Fina  and  Dothers  f  e 

lfa .   1  taxe  disappointments  sc  Keenly    1 
that  I  can't  put  them  out  of  rry 
mind 

ly.   I  arr  a  steady  person  1 

ctf .      I  get  in  a  state  cr  tension  or      1 
turmcii  as  I  thinX  ever  my  recent 
concerns  and  interests 


2 


2 

2 

2 
2 


3 


4 

4 

4 


/L 
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SCCKING  KEY 
for  the 

A-TEAIT  EVALUATION 


1 
2 
2 

4 

c 
w 

6 

7 

£ 

y 

10 
n 

12 
13 

14 

15 
16 
17 
18 

iy 

20 


4 

3 

1 

1 

2 

3 

4 

1 

2 

4 

1 

2 

4 

1 

2 

<- 

4 

4 

3 

2 

1 

4 

3 

1 

1 

2 

4 

1 

2 

4 

4 

3 

1 

1 

2 

■7 

4 

1 

2 

•7 

4 

4 

3 

1 

1 

2 

4 

1 

2 

T 

<- 

4 

4 

3 

£ 

1 

1 

2 

•Z. 

4 

1 

2 

"7 

4 

4 

3 

1 

1 

2 

4 
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AFP IN L IX  r 
SEL*-E7ALUATIGN  QUESTIONNAIRE 


NA^I  DATE 


SUBJECT* 


DIRECTIONS:  A  imrrDer  of  statements  which  people  have  used 
to  aescrite  thenseives  are  given  below.  Head  eacn  statement 
end  then  circle  the  appropriate  number  to  the  right  of  the 
statement  that  indicates  how  you  feel  RIGHT  NOW  —  AT  THIS 
VERY  MOMENT.  There  are  do  right  or  wrong  answers.  Please 
do  net  spend  too  ruch  tirre  en  any  cne  statement,  tut  give 
the  answer  that  best  describes  your  PRESENT  feelings. 

1  =  NOT  AT  ALL 

2  =  SCFEWHAT 

3  =  MODERATELY  SO 

4  =  VERY  MUCH  SC 


1.  Ifeelcaiir  l 

2 .  I  reel  secure  1 
6.  I  arr  tense  1 
4.  I  arr  regretful  1 
z .  I   feel   at  ease  1 

6.  I    feel   upset  1 

7.  I  am  presently  worrying  1 
over  possible  misfortunes 


2 


2 


6 


4 
4 
4 

4 
4 
4 
4 
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5. 

y. 
10 . 
11. 

12  . 

12. 
14. 

15. 
16. 
17. 
16  . 

19. 


I  feel  rested 

I  reel  anxious 

I  ieel    corrfortable 

I  feel    self-confident 

I  ieel    nervous 

I  arr    jittery 

I  feel    "nign    strung" 

I  err    relaxed 

I  feel    content 

I  err    worried 

I  feel  ever-excited 
anc.  "rattled" 

1  feel  joyful 

I  feel  pleasant 


1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 

1 

1 


2 

C 

2 

Z 
2 


4 
4 
4 
4 

4 
4 
4 
4 
4 
4 
4 


2 
2 
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SCORING  KIT 

for  the 
A-STATE  EVALUATION 


1.  4       3  2  1 

2.  4       3  2  1 

3.  12  3  4 

4.  12  3  4 

b.  4       3  2  1 

c.  12  3  4 
7.  12  3  4 
8  .  4  3  2  1 
b.  12  3  4 

10.  4       3  2  1 

11.  4       3  2  1 

12.  12  3  4 

13.  12  3  4 

14.  12  3  4 

15.  4       3  2  1 

16.  4       3  2  1 

17.  12  3  4 
15.  12  3  4 
19.  4       3  2  1 


•L 


0.  4       3      2      1 


iee 


APPENDIX  E 
UTTERANCE  LIST:   TRAINING  WEEK  -  WEEK#1 


fcGRD# 

000 
001 

00k: 

002 

004 

££  = 
00c 

00  7 

008 
00b 
010 
011 
012 
012 
014 
015 
01C 

017 

fclfc 
019 

02e 

021 
022 
£23 
024 
02* 
£26 
027 
Ich 

02  9 
030 
031 
^32 
033 
034 
035 
036 
037 
£3S 

03  b 
040 
041 


:IRECT0RY  TO  PCOCK 


UTTERANCE 

THREE 

EUROPE 

MOVE  IT  LEET 

CARRIAGE  RETURN 

LOGOUT 

COMMAND 

STRAIT    GE    iJCRMUZ 

TIME 

KOREA 

ZERO 

CHANGE 

ALPHA 

FOSITIVE 

IDENTIFICATION 

LAUNCH 

RELOCATE 

IELTA 

TASK  FCBCE  CCMTANDER 

KILO 

LOGIN  YSLLEN 

ECHO 

NOVEMBER 

TWO 

UNITED  STATES 

FOUR 

ERAVC 

PLACE  A  CIRCLE  ON  MOSCOW 

ENEMY 

PROCEED 

RCMEC 

FLIGHT 

S  E  V  E  N 

GROUND  CONTROL  APPROACH 

REPORT 

AIREIELD  NAME 

LIMA 

AVAILABLE 

MESSAGE 

SATELLITE 

SHOOT 

YANKEE 

AFFIRMATIVE 


A  CIRCLE  ON 
DETECTION 


CONTROLLER 


CRT  PROMPT 

THREE 

EUROPE 

MOVE  IT  LEET 

CARR  RETURN 

LOGOUT 

COMMAND 

STR  OE  HMRZ 

TIME 

KOREA 

ZERO 

C  DIR  TO  PK 

ALPHA 

POSITIVE 

IDNTIICATION 

LAUNCH 

RELOCATE 

DELTA 

TSK  5RC  CDR 

KILO 

LOGIN  TI-LLEN 

ECHO 

NOVEMBER 

TWO 

UNITED  STS 

FOUR 

BRAVO 

PL  A  CIR  MOS 

EN  DETECTION 

PROCEED 

ROMEO 

ELT  CTLR 

S  EVEN 

GND  CTL  APPR 

REPORT 

A ELD  NAME 

LIMA 

AVAILABLE 

MESSAGE 

SATELLITE 

SHOOT 

YANKEE 

AFFIRMATIVE 

167 


£42  CHARLIE  CHARLIE 

042  TORPEDO  TORPEDO 

044  FIVE  JIVE 

045  OPERATIONS  PLAN  OPNS  PLAN 

046  OiiENSS  OilENSE 

04V  LP  IN  DETAIL  UP  IN  DETAIL 

046  NINE  NINE 

04b  PROBABILITY  01  DETECTION  PRCB  CE  DETN 

0£0  NEUTRAL  NEUTRAL 

fc51  JULIETT  JULIETT 

052  SPEED  SPEED 

052  UNIFORM  UNIFORM 

£5*  SENSOR  SENSOR 

05t  TANGO  TANGO 

056  CLOSE  CUT  CHARLIE  CDS  OUT  CHRL 

05?  LOAD  THE  GAMN  LD  THE  GANN 

056  OSCAR  OSCAR 

05S  NORTH  ATLANTIC  MAP  N  ATL  MAP 

060  PACIFIC  DATA  BASF  ?AC  DAT  BASS 

061  HUMAN  FACTORS  HUM  FACTORS 

062  FOXTROT  FOXTROT 

063  SOVIET  SOVIET 

064  DEPENSE  DEFENSE 
0c5  ONE  :M 

tet  INDIA  INDIA 

06?  ADVANTAGES  ADVANTAGES 

066  GOLF  ;CLF 

£6*  CANCEL  CANCEL 

0?0  ZULU  ZULU 

071  NEGATIVE  \ZGATIVE 

£?2  FLCT  ALL  SUEMAPINES  PLT  ALL  SUBS 

e?2  XRAY  KRAY 

074  REFUEL  REFUEL 

K75  AUTOMATIC  RECOGNITION  AUTO  RECOG 

076  QUEBEC  CUEEEC 

077  TRACK  ENEMY  TRACK  ENEMY 
£76  LEVEL  TWO  LEVEL  TWO 
07b  COURSE  COURSE 

060  JOINT  TASK  FORCE  JT  TSK  FRC 

£61  SIX  SIX 

062  WHISKEY  WHISKEY 

062  ATTACK  ATTACK 

Z<64  SIERRA  SIERRA 

06£  MANEUVER  DELAY  MNUVR  DELAY 

066  DISTANCE  DISTANCE 

06?  EXECUTE  EXECUTE 

066  EIGHT  EIGHT 

06b  VICTOR  VICTOR 

eye  MEDITERRANEAN  MAP  MED  MAP 

0bl  SEA  01  JAPAN  SEA  Oi  JAPM 

0b2  POPPA  POPPA 

0b2  FILE  TRANSFER  FROTOCCL  FL  TNSFR  FRO 


16c 


0y4 

eyy 


ALTITUEE 

HOTEL 

NUKE  THEM  TILL 

ACCAT  TITLE 

MIKE 

MISSILE 


THEY  GLCV. 


ALLITUEE 
HOTEL 
NUKE  EM 
ACCAT  TITLE 
MIKE 
MISSILE 


ley 


APPENDIX  I 
UTTERANCE  LIST:   WEEK  #2 

WORD#  UTTERANCE 

£0u  MISSIIE 

001  MIKE 

££2  ACCAT  TITLE 

023  NUKE  THEM  TILL  THEY  GLOW 
004  HOTEL 

££5  ALTITUDE 

006  JjILE  TRANSFER  PROTOCOL 

00V  PCPPA 

008  SEA  OP  JAPAN 

00b  MEDITERRANEAN  MAP 

010  VICTOR 

ell  EIGHT 

012  EXECUTE 

012  DISTANCE 

eU  MANEUVER  DELAY 

01c  SIEBRA 

01c  ATTACK 

fcl7  WHISKEY 

016  SIX 

019  JOINT  TASK  ECRCE 

£2tf  COURSE 

021  LEVEL  TWO 

022  TRACK  ENEMY 
223  QUEEEC 

024  AUTOMATIC  RECOGNITION 

025  REEUZL 
£26  XPAY 

02?  PLOT  ALL  SUEMARINES 

^26  NEGATIVE 

£2b  ZULU 

030  CANCEL 

031  GCLE 

£32  ADVANTAGES 

033  INDIA 

034  ONE 

035  DEFENSE 

036  SOVIET 
03?  EOATRCT 

£36  HUMAN  FACTORS 

03b  PACIFIC  DATA  BASE 

040  NORTH  ATLANTIC  MAP 

£41  OSCAR 
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£42  LOAD  THE  GANN 

043  CLOSE  OUT  CHARLIE 

044  TANGO 
£45  SENSOR 
046  UNIIOBM 
04V  SPEED 
048  JULIETT 
04b  NEUTEAL 

050  PROBABILITY  CI  DETECTION 

£51  NINE 

052  UP  IN  DETAIL 

05c  OFFENSE 

£54  OPERATIONS  FLAN 

055  FIVE 
05c  TORPEDO 
057  CHARLIE 

056  AFFIRMATIVE 
35b  YANKEE 

£6£  SHCOl 

061  SATELLITE 

062  MESSAGE 
£63  AVAILABLE 
£64  LIMA 

065  AIRFIELD  NAI^E 

£66  REPORT 

06?  GROUND  CONTROL  APPROACH 

06c  SEVEN 

£69  PLIGHT  CONTROLLER 

070  ROMEO 

371  PROCEED 

£72  ENEMY  DETECTION 

073  PLACE  A  CIRCLE  ON  MOSCOW 

074  BRAVC 
£75  FOUR 

076  UNITED  STATES 

077  TWO 

£78  NOVEMBER 

07b  ECHO 

080  LOGIN  YELLEN 

£81  KILO 

062  TASK  FOECE  COMMANDER 

083  DELTA 

£84  RELOCATE 

085  LAUNCH 

086  IDENTIFICATION 
£87  POSITIVE 

088  ALFHA 

08b  CHANGE  DIRECTORY  TO  PCOCK 

£9£  ZERO 

£bl  KOREA 

092  TIME 

£93  STRAIT  OF  HORMUZ 
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ey4 

eye 
ey? 
0ye 

eyy 


COMMANE 

LOGOUT 

CARRIAGE  RETURN 

MOVE  IT  IE FT 

EUROPE 

THREE 


17; 


APPENDIX  6 
UTTERANCE  LIST:   WEEK  #3 

WORD#  UTTERANCE 

000  CARRIAGE  RETURN 

001  STRAIT  OF  RCRrUZ 
00k:  ZERC 

002  POSITIVE 

004  RELOCATE 

005  ilLO 

006  NOVEMBER 

007  EOUR 

00G  ENEMY  DETECTION 

00b  FLIGHT  CONTROLLER 

010  REFCRT 

011  AVAILABLE 

012  SEOCT 
01o  CHARLIE 

014  OPERATIONS  PLAN 

01t  NINE 

016  JULIITT 

01?  SENSOR 

01fc  LOAD  THE  GANN 

019  PACIFIC  DATA  BASE 

020  SOVIET 

021  INDIA 

022  CANCEL 

023  PLOT  ALL  SUBMARINES 

024  AUTOMATIC  RECOGNITION 

025  LEVEL  TWO 

026  SIX 
02?  SIERRA 
02fc  EXECUTE 

029  MEDITERRANEAN  MAP 

030  FILE  TRANSFER  PROTOCOL 

031  NUKE  THEM  TILL  THE7  GLOW 

032  MISSILE 

033  MOVE  IT  LEFT 

034  COMMAND 

035  KOREA 

036  ALPHA 
03?  LAUNCH 

03£  TASK  FORCE  COMMANDER 

03y  ECHO 

040  UNITED  STATES 

041  PLACE  A  CIRCLE  ON  MOSCOW 
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042  rgmec 

043  GROUND  CONTROL  APPROACH 

044  LIMA 

045  SATELLITE 

046  AFFIRMATIVE 
e4?  FIVE 

048  UP  IN  DETAIL 

£49  NFUTRAL 

05  0  UNIFORM 

051  CLOSE  OUT  CHARLIE 
05c  NORTH  ATLANTIC  MAP 

052  10XTBOT 

054  ONE 

055  GOLF 
05c  NEGATIVE 
05?  KE1UEL 

056  TRACK  ENEMY 

059  JOINT  TASK  FORCE 

060  ATTACK 

061  EISTANCE 
£62  VICTCH 

062  POPPA 

064  HOTEL 
£65  MIKE 

066  EUROPE 

067  LOGOUT 

068  TIME 

069  CHANGE  DIRECTORY  TO  FOOCF. 

070  IDENTIFICATION 

071  DELTA 

072  LOGIN  YSLLEN 
072  THREE 

074  T*0 

07t  PRAVC 

076  PROCEED 

07?  SEVEN 

076  AIREIELD  NAP'S 

07b  MESSAGE 

080  YANKEE 

081  TORPEDO 
062  OFFENSE 

082  FRCbAEILITY  OE  DETECTION 
084  SPEED 

065  TANGO 
086  OSCAR 

08?  HUMAN  FACTORS 

086  DEFENSE 

089  ADVANTAGES 

0y0  ZULU 

091  ARAY 

092  QUEBEC 
092  COURSE 
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0b»4 

eye 

09fc 
099 


WHISKEY 

rANEUVER  DELAY 

EIGHT 

SEA  OS    JAPAN 

ALTITUDE 

ACCAT  TITLE 


NAM: 


RANK: 


AFPINUIX  H 

DATA  COLLECTION  £CRM 

SIX:  K   F 


SUBJECT  #: 


LAY/TIME:  [TRIALS  1-2] 

ITRIALS  2-4] 

[TRIALS  b-6] 


^*EEK#:  1  k 


MICROPHONE: 
TRAINING: 


EXPERIENCE! 
SUPERVISED 


NON-EXPERIENCED 
NCN-SUPERVISED 


j     UTTERANCE       |                TRIAL  # 

i                     \       1          \       k          \       2          \       ±          \       5         \       6 

ITEREE               !       !       !       i       !       i 
jEtRGPE             i!                    !       I 
ihCVE  IT  11  il                                                   |       ! 
j CARRIAGE  RETURN                   !       | 
ILCGOUT                                         !       ! 
1  COMMAND             ill!              ! 
i STRAIT  C5  HCBPUZ          !       !                    ! 
[TIME                            111! 
i KOREA             !      !             !             ! 
IZERC               !             i       i       ! 
;ChG  DIR  TO  POCCK          i       ! 

i?e 


ALPHA 

PCSITIVI 

IEENTIIICATICN 

LAUNCH 

RE1CCATI 

TASK  ECRCE  CMER 

KILC 

LOGIN  YEILEN 

ECHO 

NOVEMBER 

T'*0 

UNITED  STATES 

FOUR 

ERAVO 

PL  CIRCLE  ON  T'CSCCW 

ENEMY  DETECTION 

FROCEEL 

RGrEO 

PLIGHT  CCNTRCLLIR 

SEVEN 

GRND  CTRL  APPROACH 

REPORT 

AIRilELI  NAME 

LIMA 
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AVAILABLE  i  j  ! 

MESSAGE  |  !  ! 

SATELLITE  !  i  i  i       ! 

SHOOT  j  |  |  ! 

YANKEE  !  !  !  i       ! 

AFFIRMATIVE  !  !  !  i              ! 

CBASLIE  I  i  !  !       i       ! 

TCEir'ELO  !  !!!! 

FIVE  |  !  j  j       j       j 

OPERATIONS  PLAN     !  !  t  !  !       j 

OIEENSE  !  !  i  !       ! 

TJ?  IN  DETAIL  lit  ! 

NINE  |  !  i!i 

PROP.  OS  LETECTION   !  !  !  i  ! 

NEUTRAL  i!!  II 

JUL  I ITT  !  i  i  i       i       i 

m  n  rnT\  I  I  I  I               I               \ 

ortiiU  ii  iii 

UN  I  EG EM  i  !!!! 

SENSOE  !  i  i  !                 i                 i 

TAN&C  j  !  ill 

CLOSE  OUT  CdAEIIE   iii!! 

lcaD  the  gann  !  i  i  ! 

oscae  !  !  !  ! 

noeth  atlantic  map  !  !  !  i 

PACIFIC    LATA    J3ASE       j  !  !  !  j 
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HUFAN    iACTCRS 
EGXTRCT 
SOVIET 
LEEENSE 

Ck  I  T^ 

INDIA 

AEVANTAGES 

GGL1 

CANCEL 

ZULU 

NEGATIVE 

PICT   ALL    SOBMABINIS 

XRAY 

REiUEL 

AUTC    RECOGNITION 

^  i  i  t~  t\  Tr>  r* 

TRACK    ENEMY 

LEVEL    TWO 

COURSE 

JOINT    TASK    EORCE 

SIX 

WHISKEY 

ATTACK 

SIERRA 

MANEUVER    EELAY 


I7y 


_  1           „  1              1            _  1              1 ! 

DISTANCE           !       ,'             !       !       | 
EXECUTE            i       i       I       1       I       i 
EIGhT              i       i       i       i       i       i 
VICTOR                         |      |      |      ! 
MEDITERRANEAN  PA?         i       i       !       !       ! 
SEA  01    JAPAN                     !       !       !       ! 

poppa            i      !      J      !      !      i 
FILE  TNSIR  fHCTCCCIj       j       j       j       |       | 
AITITUDE                        !ii! 
HOTEL               ii!              i 
NUKE  TILL  THEY  GLOW j       |             !       !       ! 

ACCAT  TITLE         |              |       |       |       | 

i'UE            !     !     i     !     i     i 

fissile          i      i      !      1      !      ! 

f              !              1              1              1              1 

DATA  REDUCTION 


#  NGN-RECOGNITIONS  i 

#  MIS-RECCGNITICNS  i 

#  TOTAL  ERRORS 


1843 


APPENLIX  I 
MASTER  LIST  OS    UTTERANCES 


ONI  SELLABLE  UTTERANCES 

CHE 

T'VO 

THREE 

FOUH 

ilVE 

SIX 

EIGHT 

NINE 

C-OII 

MIKE 

LAUNCH 

TIME 

SHOOT 

SFEEL 

COURSE 

(15) 


2.      TfoO  SYLLABLE  UTTERANCES   (35) 


T\*0  SYLLABLE  UTTERANCES 

EUROPE 

LOGOUT 

ZERO 

SEVEN 

ALPHA 

BRAVO 

CHARLIE 

LELTA 

ECHO 

iOXTRCT 

HOTEL 

KILO 

LIMA 

OSCAR 

POPPA 

QUEBEC 

TANGO 

VICTOR 

WHISKEY 

XRAY 

YANKEE 
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ZULU 

COMMAND 

REPORT 

OFFENSE 

DEFENSE 

ATTACK 

PROCEED 

CANCEI 

MESSAGE 

DISTANCE 

NEUTRAL 

MISSILE 

SENSOE 

REEUEL 


THREE  SYLLABLE  UT'IERANCES  (20) 

MOVE  IT  LEET 

SOVIET 

JOINT  TASK  ECRCE 

MOVEMEER 

JULIETT 

ROMEO 

SIERRA 

INDIA 

UNIEOEM 

KOREA 

NEGATIVE 

POSITIVE 

EXECUTE 

AIREIELD  NAME 

ALTITUDE 

RELOCATE 

LOAD  THE  GANN 

LEVEL  T1*0 

SATELLITE 

TORPEDO 


4.   iCUR  SYlLAhlZ    UTTERANCES   UO 

CARRIAGE  RETURN 
LOGIN  XILL1H 
STRAIT  CE  HORMUZ 
UNITED  STATES 
FLIGHT  CONTROLLER 
AVAILABLE 
Ai'ilBMATIVE 
UP  IN  DETAIL 


ItZ 


CLOSE  CUT  CHARLE 
HUMAN  i ACTORS 
ADVANTAGES 

iRACK  ENEMY 
SEA  OE  JAPAN 
ACCAT  TITLE 


UTTERANCES  GREATER  TEAN  OR  EQUAL  TC  5  SYLLABLES  (16) 

MANEUVER  LELAY 

CHANGE  DIRECTORY  TO  PCCCK 

IDEN1IEICATICN 

TASK  i«CRCE  CCMf-ANDER 

PLACE  A  CIRCLE  ON  MOSCOW 

GROUNE  CONTROL  APPROACH 

ENEMY  DETECTION 

NORTH  ATLANTIC  MAP 

MEDITERRANEAN  MAP 

PROBABILITY  OE  DETECTION 

OPERATIONS  PLAN 

PACIFIC  DATA  HASE 

PLOT  ALL  SUBMARINES 

AUTOMATIC  RECOGNITION 

PILE  TRANSFER  PROTOCOL 

NUKE  THEM  TILL  THEY  GLOW 


1£2 


APPENDIX  J 
INDIVIDUAL  SUBJECT  RECOGNITION  RATES 

The  following  are  mean  error  rates  for  each  subject 
participating  in  the  experiment.  The  data  is 
partitioned  to  rirror  the  groins  established  in  the 
overall  experimental  design  and  are  expressed  In  percent 
error. 

GHCU1  I  GROUP  II 

4.6b  13.11 
7.17  y.£2 

7.3b>  s.cy 

4.3y  6.39 

y  . cc  5 . 22 

t  .44  6. £9 

b  .06 
1.61 
2.89 
2.61 


6 

72 

4 

.06 

2 

.00 

1 

.67 
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GflCUF    I] 

[1 

4.06 

2.11 

.52 

6.y4 

y  .28 

A.       "^ 

1   <bb 

t  .72 

S  .22 

4.5^ 

£  .b»4 

2.61 

GROUP    IV 

10.11 

15.17 

4.69 

lb. 72 

8.06 

y  .06 

8.44 
6.28 
2.3y 
7.11 
4.23 
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